Ad verba per numeros
HOT!, Técnicas PLN/NLP, Recuperación de Información, Servicios y Utilidades, Hype-Buzz
Thursday, June 4, 2009, 11:18 AM
Let's start with a short definition by Wikipedia (yeah, I know, real men don't cite encyclopedias):Thursday, June 4, 2009, 11:18 AM
Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured information, i.e. categorized and contextually and semantically well-defined data from a certain domain, from unstructured machine-readable documents.In other words, the goal of information extraction is to obtain table schemas from raw data and then complete the records in such tables.Nonetheless to say that information extraction is a really tough problem and that, to the best of my knowledge, there are no other publicly available application other than the recent Google Squared.I suppose that these days most of the people (i.e. bloggers) would be joyfully jumping around this new tool (which, I confess, is pretty amazing). However, I would like to point to an academic project that I knew 2 years ago: the Proteus Project at the New York University.As I've said I had the opportunity to attend a talk by Satoshi Sekine and he showed us a piece of software doing mostly the same than Google Squared. Obviusly, I was impressed and excited, that's why I'm not so excited today with Squared
![:)](interface/emoticons/happy.png)
- S. Brin, "Extracting Patterns and Relations from the World Wide Web," Selected papers from the International Workshop on The World Wide Web and Databases, Springer-Verlag, 1999, pp. 172-183.
- M. Pasca, "Acquisition of categorized named entities for web search," Proceedings of the thirteenth ACM international conference on Information and knowledge management, Washington, D.C., USA: ACM, 2004, pp. 137-145.
- M. Pasca, "Organizing and searching the world wide web of facts--step two: harnessing the wisdom of the crowds," Proceedings of the 16th international conference on World Wide Web, 2007, pp. 101-110.
- M. Pasca, "Weakly-supervised discovery of named entities using web search queries," Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 2007, pp. 683-690.
- M. Pasca and B. Van Durme, "What you seek is what you get: Extraction of class attributes from query logs," Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), p. 28322837.
Next