Efficient Representation and Processing of Incomplete Information

Antova, Lyublena

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Thesis

Efficient Representation and Processing of Incomplete Information

MPS-Authors

/persons/resource/persons44024

Antova, Lyublena
International Max Planck Research School, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Antova, L. (2006). Efficient Representation and Processing of Incomplete Information. Master Thesis, Universität des Saarlandes, Saarbrücken.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0027-D311-7

Abstract

Database systems often have to deal with incomplete information as the world they model is not always complete. This is a frequent case in data integration applications, scientific databases, or in scenarios where information is manually entered and constains errors or ambiguity. In the las couple of decades different formalisms have been proposed for representing incomplete information. These include, among other, the so called realtions with or-sets, tables with variables (v-tables) and conditional tables (c-tables). However, none of the current approaches for representing incomple information has satisfied the requirements for a powerful and effcient data management system, which is the reason why none has found application in practice. All models generally suffer from at least one of two weaknesses. Either they are not strong enough for representing results of simple queries, as is the case for v-tables and realtions with or-sets, or the handling and processing of the data, e.g. for query evaluation, is intractable (as is the case for c-tables). In this thesis, e present a decomposition-based approach to addressing the problem of incompletely specified databases. we introduce world-set decompositions (WSDs), a space-efficient formalism for repreenting any finite set of possible worlds over relational algebra queries on WSDs. For each relational algebra operation we present an algorithm operting on WSDs. We also address the problem of data cleaning in the context of world set decompositions. We present a modified version of the existing Chase algorithm, which we use to remove inconsistent worlds in an incompletely specified database. We evaluate our techniques in a large census data scenario with data originating from the 1990 USA census and we show that data processing on WSDs is both scalable and efficient.