de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Forschungspapier

Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons188983

Boley,  Mario
Databases and Information Systems, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons79525

Vreeken,  Jilles
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)

arXiv:1701.07696.pdf
(Preprint), 3MB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Boley, M., Goldsmith, B. R., Ghiringhelli, L. M., & Vreeken, J. (2017). Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery. Retrieved from http://arxiv.org/abs/1701.07696.


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-002D-90DB-F
Zusammenfassung
Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the average absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.