de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Forschungspapier

Clustering Boolean Tensors

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons79373

Metzler,  Saskia
Databases and Information Systems, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons45046

Miettinen,  Pauli
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)

arxiv:1501.00696.pdf
(Preprint), 692KB

Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Metzler, S., & Miettinen, P. (2015). Clustering Boolean Tensors. Retrieved from http://arxiv.org/abs/1501.00696.


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-0024-6C5B-8
Zusammenfassung
Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets.