hide
Free keywords:
-
Abstract:
Graph data such as chemical compounds and XML documents are getting more
common in many application domains.
A main difficulty of graph data processing
lies in the intrinsic high dimensionality of graphs, namely,
when a graph is represented as a binary feature vector
of indicators of all possible subgraph patterns,
the dimensionality gets too large for usual statistical methods.
We propose a nonparametric Bayesian method for clustering graphs and
selecting salient patterns at the same time.
Variational inference is adopted here, because sampling
is not applicable due to extremely high dimensionality.
The feature set minimizing the free energy is efficiently
collected with the DFS code tree, where the generation of useless subgraphs
is suppressed by a tree pruning condition.
In experiments, our method is compared with a simpler approach based on
frequent subgraph mining, and graph kernels.