English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Deriving a Web-scale Common Sense Fact Knowledge Base

Tandon, N. (2011). Deriving a Web-scale Common Sense Fact Knowledge Base. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is

Files

show Files
hide Files
:
2011_Niket_Tandon_Thesis.pdf (Any fulltext), 809KB
 
File Permalink:
-
Name:
2011_Niket_Tandon_Thesis.pdf
Description:
-
OA-Status:
Visibility:
Restricted (Max Planck Institute for Informatics, MSIN; )
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Tandon, Niket1, 2, Author           
Weikum, Gerhard1, Advisor           
Theobalt, Christian3, Referee           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              
2International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551              
3Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047              

Content

show
hide
Free keywords: -
 Abstract: The fact that birds have feathers and ice is cold seems trivially true. Yet, most machine-readable sources of knowledge either lack such common sense facts entirely or have only limited coverage. Prior work on automated knowledge base construction has largely focused on relations between named entities and on taxonomic knowledge, while disregarding common sense properties. Extracting such structured data from text is challenging, especially due to the scarcity of explicitly expressed knowledge. Even when relying on large document collections, patternbased information extraction approaches typically discover insufficient amounts of information. This thesis investigates harvesting massive amounts of common sense knowledge using the textual knowledge of the entire Web, yet staying away from the massive engineering efforts in procuring such a large corpus as a Web. Despite the advancements in knowledge harvesting, we observed that the state of the art methods were limited in terms of accuracy and discovered insufficient amounts of information under our desired setting. This thesis shows how to gather large amounts of common sense facts from Web N-gram data, using seeds from the existing knowledge bases like ConceptNet. Our novel contributions include scalable methods for tapping onto Web-scale data and a new scoring model to determine which patterns and facts are most reliable. An extensive experimental evaluation is provided for three different binary relations, comparing different sources of n-gram data as well as different algorithms. The experimental results show that this approach extends ConceptNet by many orders of magnitude (more than 200-fold) at comparable levels of precision.

Details

show
hide
Language(s): eng - English
 Dates: 2011-082011
 Publication Status: Issued
 Pages: X, 81 p.
 Publishing info: Saarbrücken : Universität des Saarlandes
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: MasterTandon2011
 Degree: Master

Event

show

Legal Case

show

Project information

show

Source

show