Joint Models for Information and Knowledge Extraction

Nguyen, Dat Ba

doi:10.22028/D291-26943

Item

ITEM ACTIONSEXPORT

Add to Basket

Please note that a newer version of this item is available:
https://pure.mpg.de/pubman/item/item_2511376_5

DetailsSummary

Released

Thesis

Joint Models for Information and Knowledge Extraction

MPS-Authors

/persons/resource/persons103083

Nguyen, Dat Ba
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Resource

https://publikationen.sulb.uni-saarland.de/handle/20.500.11880/26895
(Any fulltext)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Nguyen, D. B. (2017). Joint Models for Information and Knowledge Extraction. PhD Thesis, Universität des Saarlandes, Saarbrücken. doi:10.22028/D291-26943.

Cite as: https://hdl.handle.net/11858/00-001M-0000-002E-890F-9

Abstract

Information and knowledge extraction from natural language text is a key asset for question answering, semantic search, automatic summarization, and other machine reading applications. There are many sub-tasks involved such as named entity recognition, named entity disambiguation, co-reference resolution, relation extraction, event detection, discourse parsing, and others. Solving these tasks is challenging as natural language text is unstructured, noisy, and ambiguous. Key challenges, which focus on identifying and linking named entities, as well as discovering relations between them, include: • High NERD Quality. Named entity recognition and disambiguation, NERD for short, are preformed first in the extraction pipeline. Their results may affect other downstream tasks. • Coverage vs. Quality of Relation Extraction. Model-based information extraction methods achieve high extraction quality at low coverage, whereas open information extraction methods capture relational phrases between entities. However, the latter degrades in quality by non-canonicalized and noisy output. These limitations need to be overcome. • On-the-fly Knowledge Acquisition. Real-world applications such as question answering, monitoring content streams, etc. demand on-the-fly knowledge acquisition. Building such an end-to-end system is challenging because it requires high throughput, high extraction quality, and high coverage. This dissertation addresses the above challenges, developing new methods to advance the state of the art. The first contribution is a robust model for joint inference between entity recognition and disambiguation. The second contribution is a novel model for relation extraction and entity disambiguation on Wikipediastyle text. The third contribution is an end-to-end system for constructing querydriven, on-the-fly knowledge bases.