|
AUTOMATIC MEANING DISCOVERY USING GOOGLE
|
Speaker: Paul Vitanyi (CWI and University of Amsterdam and
National ICT of Australia).
Slides are available in PDF.
We have found a method to automatically extract the meaning
of words and phrases from the world-wide-web using Google
page counts. The approach is novel in its unrestricted
problem domain, simplicity of implementation, and
manifestly ontological underpinnings. The world-wide-web is
the largest database on earth, and the latent semantic
context information entered by millions of independent
users averages out to provide automatic meaning of useful
quality. We demonstrate positive correlations, evidencing
an underlying semantic structure, in both numerical symbol
notations and number-name words in a variety of natural
languages and contexts. Next, we demonstrate the ability to
distinguish between colors and numbers, and to distinguish
between 17th century Dutch painters; the ability to
understand electrical terms, religious terms, emergency
incidents, and we conduct a massive experiment in
understanding WordNet categories; and the ability to do a
simple automatic English-Spanish translation.
This is joint work with Rudi Cilibrasic; for the full paper
check http://arxiv.org/abs/cs.CL/0412098;
Recently reported
in: A search for meaning, New Scientist, 29 January 2005,
p.21, by Duncan Graham-Rowe:
http://www.newscientist.com/channel/info-tech/mg18524846.100
and immediately made it to Slashdot --- News for nerds,
Stuff that matters
http://science.slashdot.org/article.pl?sid=05/01/29/1815242&tid=217&tid=14
|