English
English
Deutsch
日本語
Help
Privacy Policy
Disclaimer
Include files
Advanced Search
Browse
START
BASKET (0)
Tools
Item
ITEM ACTIONS
EXPORT
EndNote (UTF-8)
BibTeX
JSON
eSciDoc XML
MarcXML
pdf
docx (MS Word, Open Office)
html (plain)
html (linked)
JSON Snippet
eSciDoc Snippet
Download
E-Mail
Local Tags
Release History
Details
Summary
Quality in Phrase Mining
Jindal, A.
(2009).
Quality in Phrase Mining. Master Thesis, Universität des Saarlandes, Saarbrücken.
Item is
Released
show all
hide all
Basic
show
hide
Item Permalink
:
https://hdl.handle.net/11858/00-001M-0000-0027-BA7D-1
Version Permalink
:
https://hdl.handle.net/11858/00-001M-0000-0027-BA7E-0
Genre
:
Thesis
Files
show Files
hide Files
:
Master_Thesis_Jindal_2009.pdf (Any fulltext), 2MB
File Permalink
:
-
Name
:
Master_Thesis_Jindal_2009.pdf
Description
:
-
OA-Status
:
Visibility
:
Restricted (Max Planck Institute for Informatics, MSIN; )
MIME-Type / Checksum
:
application/pdf
Technical Metadata
:
Copyright Date
:
-
Copyright Info
:
-
License
:
-
Locators
show
Creators
show
hide
Creators
:
Jindal, Alekh
1
, Author
Weikum, Gerhard
2
, Advisor
Affiliations
:
1
International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551
2
Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018
Content
show
hide
Free keywords
:
-
Abstract
:
Phrase snippets of large text corpora like news articles or web search results offer great insight and analytical value. While much of the prior work is focussed on efficient storage and retrieval of all candidate phrases, little emphasis has been laid on the quality of the result set. In this thesis, we define phrases of interest and propose a framework for mining and post-processing interesting phrases. We focus on the quality of phrases and develop techniques to mine minimal-length maximal-informative sequences of words.The techniques developed are streamed into a post-processing pipeline and include exact and approximate match-based merging, incomplete phrase detection with filtering, and heuristics-based phrase classification. The strategies aim to prune the candidate set of phrases down to the ones being meaningful and having rich content. We characterize the phrases with heuristics- and NLP-based features. We use a supervised learning based regression model to predict their interestingness. Further, we develop and analyze ranking and grouping models for presenting the phrases to the user. Finally, we discuss relevance and performance evaluation of our techniques. Our framework is evaluated using a recently released real world corpus of New York Times news articles.
Details
show
hide
Language(s)
:
eng - English
Dates
:
Accepted:
2009-12-28
Date issued:
2009
Publication Status
:
Issued
Pages
:
-
Publishing info
:
Saarbrücken : Universität des Saarlandes
Table of Contents
:
-
Rev. Type
:
-
Identifiers
:
BibTex Citekey: Jindal2010
Degree
:
Master
Event
show
Legal Case
show
Project information
show
Source
show