de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

 
 
 
 
DownloadE-Mail
  Getting to Know the Unknown Unknowns: Destructive-noise Resistant Boolean Matrix Factorization

Karaev, S., Miettinen, P., & Vreeken, J. (2015). Getting to Know the Unknown Unknowns: Destructive-noise Resistant Boolean Matrix Factorization. In S. Venkatasubramanian, & J. Ye (Eds.), Proceedings of the 2015 SIAM International Conference on Data Mining (pp. 325-333). Philadelphia, PA: SIAM. doi:10.1137/1.9781611974010.37.

Item is

Basisdaten

einblenden: ausblenden:
Datensatz-Permalink: http://hdl.handle.net/11858/00-001M-0000-0024-6C59-C Versions-Permalink: http://hdl.handle.net/11858/00-001M-0000-0028-8542-0
Genre: Konferenzbeitrag
Latex : Getting to Know the Unknown Unknowns: {D}estructive-noise Resistant {Boolean} Matrix Factorization

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Karaev, Sanjar1, Autor              
Miettinen, Pauli1, Autor              
Vreeken, Jilles1, Autor              
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, escidoc:24018              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Finding patterns from binary data is a classical problem in data mining, dating back to at least frequent itemset mining. More recently, approaches such as tiling and Boolean matrix factorization (BMF), have been proposed to find sets of patterns that aim to explain the full data well. These methods, however, are not robust against non-trivial destructive noise, i.e. when relatively many 1s are removed from the data: tiling can only model additive noise while BMF assumes approximately equal amounts of additive and destructive noise. Most real-world binary datasets, however, exhibit mostly destructive noise. In presence/absence data, for instance, it is much more common to fail to observe something than it is to observe a spurious presence. To address this problem, we take the recent approach of employing the Minimum Description Length (MDL) principle for BMF and introduce a new algorithm, Nassau, that directly optimizes the description length of the factorization instead of the reconstruction error. In addition, unlike the previous algorithms, it can adjust the factors it has discovered during its search. Empirical evaluation on synthetic data shows that Nassau excels at datasets with high destructive noise levels and its performance on real-world datasets confirms our hypothesis of the high numbers of missing observations in the real-world data.

Details

einblenden:
ausblenden:
Sprache(n): eng - Englisch
 Datum: 2014-12-1620152015
 Publikationsstatus: Im Druck publiziert
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: BibTex Citekey: karaev15getting
DOI: 10.1137/1.9781611974010.37
 Art des Abschluß: -

Veranstaltung

einblenden:
ausblenden:
Titel: 15th SIAM International Conference on Data Mining
Veranstaltungsort: Vancouver, Canada
Start-/Enddatum: 2015-04-30 - 2015-05-02

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:
ausblenden:
Titel: Proceedings of the 2015 SIAM International Conference on Data Mining
  Kurztitel : SDM 2015
Genre der Quelle: Konferenzband
 Urheber:
Venkatasubramanian, Suresh1, Herausgeber
Ye, Jieping1, Herausgeber
Affiliations:
1 External Organizations, escidoc:persistent22            
Ort, Verlag, Ausgabe: Philadelphia, PA : SIAM
Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 325 - 333 Identifikator: ISBN: 978-1-61197-401-0