hide
Free keywords:
-
Abstract:
Automatic information extraction (IE) enables the construction of very large
knowledge bases (KBs), with relational facts on millions of entities from text
corpora and Web sources. However, such KBs contain errors and they are far from
being complete. This motivates the need for exploiting human intelligence and
knowledge using crowd-based human computing (HC) for assessing the validity of
facts and for gathering additional knowledge. This paper presents a novel
system architecture, called Higgins, which shows how to effectively integrate
an IE engine and a HC engine. Higgins generates game questions
where players choose or fill in missing relations for subject-relation-object
triples. For generating multiple-choice answer candidates, we have constructed
a large dictionary of entity names and relational phrases, and have developed
specifically designed statistical language models for phrase relatedness. To
this end, we combine semantic resources like WordNet, ConceptNet, and others
with statistics derived from a large Web corpus. We demonstrate the
effectiveness of Higgins for knowledge acquisition by crowdsourced gathering of
relationships between characters in narrative descriptions of movies and books.