Basic Research in Informatics for Creating the Knowledge Society
ABOUT BRICKS
Background
Consortium
Organization
Boards
Funding


RESEARCH
Projects
Publications
Phd Theses
Posters


NEWS & AGENDA
News
Agenda


CONTACT
Contact
AUTOMATIC MEANING DISCOVERY USING GOOGLE
Speaker: Paul Vitanyi (CWI and University of Amsterdam and National ICT of Australia).

Slides are available in PDF.

We have found a method to automatically extract the meaning of words and phrases from the world-wide-web using Google page counts. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. The world-wide-web is the largest database on earth, and the latent semantic context information entered by millions of independent users averages out to provide automatic meaning of useful quality. We demonstrate positive correlations, evidencing an underlying semantic structure, in both numerical symbol notations and number-name words in a variety of natural languages and contexts. Next, we demonstrate the ability to distinguish between colors and numbers, and to distinguish between 17th century Dutch painters; the ability to understand electrical terms, religious terms, emergency incidents, and we conduct a massive experiment in understanding WordNet categories; and the ability to do a simple automatic English-Spanish translation.

This is joint work with Rudi Cilibrasic; for the full paper check http://arxiv.org/abs/cs.CL/0412098;
Recently reported in: A search for meaning, New Scientist, 29 January 2005, p.21, by Duncan Graham-Rowe: http://www.newscientist.com/channel/info-tech/mg18524846.100 and immediately made it to Slashdot --- News for nerds, Stuff that matters http://science.slashdot.org/article.pl?sid=05/01/29/1815242&tid=217&tid=14


© 2004-2009 BRICKS Consortium