Graph Mining with Variational Dirichlet Process Mixture Models

Tsuda, K; Zaki, MJ

doi:10.1137/1.9781611972788.39

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Graph Mining with Variational Dirichlet Process Mixture Models

MPS-Authors

/persons/resource/persons84265

Tsuda, K
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://epubs.siam.org/doi/pdf/10.1137/1.9781611972788.39
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Tsuda, K. (2008). Graph Mining with Variational Dirichlet Process Mixture Models. In C. Apte, H. Park, K. Wang, & M. Zaki (Eds.), 8th SIAM International Conference on Data Mining 2008 (pp. 432-442). Philadelphia, PA, USA: Society for Industrial and Applied Mathematics.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C9DD-B

Abstract

Graph data such as chemical compounds and XML documents are getting more
common in many application domains.
A main difficulty of graph data processing
lies in the intrinsic high dimensionality of graphs, namely,
when a graph is represented as a binary feature vector
of indicators of all possible subgraph patterns,
the dimensionality gets too large for usual statistical methods.
We propose a nonparametric Bayesian method for clustering graphs and
selecting salient patterns at the same time.
Variational inference is adopted here, because sampling
is not applicable due to extremely high dimensionality.
The feature set minimizing the free energy is efficiently
collected with the DFS code tree, where the generation of useless subgraphs
is suppressed by a tree pruning condition.
In experiments, our method is compared with a simpler approach based on
frequent subgraph mining, and graph kernels.