Deutsch
 
Hilfe Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Hochschulschrift

Deriving a Web-scale Common Sense Fact Knowledge Base

MPG-Autoren
/persons/resource/persons45589

Tandon,  Niket
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine externen Ressourcen hinterlegt
Volltexte (beschränkter Zugriff)
Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte in PuRe verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Tandon, N. (2011). Deriving a Web-scale Common Sense Fact Knowledge Base. Master Thesis, Universität des Saarlandes, Saarbrücken.


Zitierlink: https://hdl.handle.net/11858/00-001M-0000-0027-ABF9-8
Zusammenfassung
The fact that birds have feathers and ice is cold seems trivially true. Yet, most machine-readable sources of knowledge either lack such common sense facts entirely or have only limited coverage. Prior work on automated knowledge base construction has largely focused on relations between named entities and on taxonomic knowledge, while disregarding common sense properties. Extracting such structured data from text is challenging, especially due to the scarcity of explicitly expressed knowledge. Even when relying on large document collections, patternbased information extraction approaches typically discover insufficient amounts of information. This thesis investigates harvesting massive amounts of common sense knowledge using the textual knowledge of the entire Web, yet staying away from the massive engineering efforts in procuring such a large corpus as a Web. Despite the advancements in knowledge harvesting, we observed that the state of the art methods were limited in terms of accuracy and discovered insufficient amounts of information under our desired setting. This thesis shows how to gather large amounts of common sense facts from Web N-gram data, using seeds from the existing knowledge bases like ConceptNet. Our novel contributions include scalable methods for tapping onto Web-scale data and a new scoring model to determine which patterns and facts are most reliable. An extensive experimental evaluation is provided for three different binary relations, comparing different sources of n-gram data as well as different algorithms. The experimental results show that this approach extends ConceptNet by many orders of magnitude (more than 200-fold) at comparable levels of precision.