Learning Tuple Probabilities in Probabilistic Databases

Dylla, Maximilian; Theobald, Martin

Local TagsRelease HistoryDetailsSummary

Learning Tuple Probabilities in Probabilistic Databases

Dylla, M., & Theobald, M.(2014). Learning Tuple Probabilities in Probabilistic Databases (MPI-I-2014-5-001). Saarbrücken: Max-Planck-Institut für Informatik.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-0019-8492-6 Version Permalink: https://hdl.handle.net/11858/00-001M-0000-0024-022E-C

Genre: Report

Files

show Files

hide Files

:

MPI-I-2014-5-001.pdf (Any fulltext), 740KB

View Save

File Permalink:
https://hdl.handle.net/11858/00-001M-0000-0019-8496-D

Name:
MPI-I-2014-5-001.pdf

Description:
-

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
-

Locators

show

Creators

show

hide

Creators:
Dylla, Maximilian¹, Author
Theobald, Martin¹, Author

Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018

Content

show

hide

Free keywords: -

Abstract: Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from query answers, the latter of which are represented as labeled lineage formulas. Specifically, we consider labels in the form of pairs, each consisting of a Boolean lineage formula and a marginal probability that comes attached to the corresponding query answer. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, devise two optimization-based objectives, and provide an efficient algorithm (based on Stochastic Gradient Descent) for solving these objectives. Finally, we conclude this work by an experimental evaluation on three real-world and one synthetic dataset, while competing with various techniques from SRL, reasoning in information extraction, and optimization.

Details

show

hide

Language(s): eng - English

Dates: Published Online: 2014

Publication Status: Published online

Pages: 51 p.

Publishing info: Saarbrücken : Max-Planck-Institut für Informatik

Table of Contents: -

Rev. Type: -

Identifiers: Report Nr.: MPI-I-2014-5-001
BibTex Citekey: Dylla-Learning2014

Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show

hide

Title: Research Report

Source Genre: Series

Creator(s):

Affiliations:

Publ. Info: -

Pages: - Volume / Issue: - Sequence Number: - Start / End Page: - Identifier: ISSN: 0946-011X