hide
Free keywords:
-
Abstract:
Redescription mining is a powerful data analysis tool that is used to find
multiple descriptions of the same entities. Consider geographical regions as an
example. They can be characterized by the fauna that inhabits them on one hand
and by their meteorological conditions on the other hand. Finding such
redescriptors, a task known as niche-finding, is of much importance in biology.
Current redescription mining methods cannot handle other than Boolean data.
This restricts the range of possible applications or makes discretization a
prerequisite, entailing a possibly harmful loss of information. In
niche-finding, while the fauna can be naturally represented using a Boolean
presence/absence data, the weather cannot.
In this paper, we extend redescription mining to categorical and real-valued
data with possibly missing values using a surprisingly simple and efficient
approach. We provide extensive experimental evaluation to study the behaviour
of the proposed algorithm. Furthermore, we show the statistical significance of
our results using recent innovations on randomization methods.