hide
Free keywords:
-
Abstract:
Learning the parameters of complex probabilistic-relational models from labeled
training data is a standard technique in machine learning, which has been
intensively studied in the subfield of Statistical Relational Learning (SRL),
but---so far---this is still an under-investigated topic in the context of
Probabilistic Databases (PDBs). In this paper, we focus on learning the
probability values of base tuples in a PDB from query answers, the latter of
which are represented as labeled lineage formulas. Specifically, we consider
labels in the form of pairs, each consisting of a Boolean lineage formula and a
marginal probability that comes attached to the corresponding query answer. The
resulting learning problem can be viewed as the inverse problem to confidence
computations in PDBs: given a set of labeled query answers, learn the
probability values of the base tuples, such that the marginal probabilities of
the query answers again yield in the assigned probability labels. We analyze
the learning problem from a theoretical perspective, devise two
optimization-based objectives, and provide an efficient algorithm (based on
Stochastic Gradient Descent) for solving these objectives. Finally, we conclude
this work by an experimental evaluation on three real-world and one synthetic
dataset, while competing with various techniques from SRL, reasoning in
information extraction, and optimization.