FROM BRICKS TO BUILDINGS
BRICKS CLOSING SYMPOSIUM
MONDAY JUNE 22, 2009
|
16:00 - 16:30
Universal similarity
Prof. dr. ir. Paul Vitányi (CWI, UvA)
We survey a new area of parameter-free similarity distance measures useful in
data-mining, pattern recognition, learning and automatic semantics extraction.
Given a family of distances on a set of objects, a distance is universal up to a
certain precision for that family if it minorizes every distance in the family
between every two objects in the set, up to the stated precision (we do not
require the universal distance to be an element of the family). We consider
similarity distances for two types of objects: literal objects that as such
contain all of their meaning, like genomes or books, and names for objects. The
latter may have literal embodyments like the first type, but may also be
abstract like "red"
or "christianity."
For the first type we consider a
family of computable distance measures corresponding to parameters expressing
similarity according to particular features between pairs of literal objects.
For the second type we consider similarity distances generated by web users
corresponding to particular semantic relations between the (names for) the
designated objects. For both families we give universal similarity distance
measures, incorporating all particular distance measures in the family. In the
first case the universal distance is based on compression and in the second case
it is based on Google page counts related to search terms. In both cases
experiments on a massive scale give evidence of the viability of the approaches.
Back to symposium program
|