Near-optimal supervised feature selection among frequent subgraphs

Thoma, M; Cheng, H; Gretton, A; Han, J; Kriegel, H-P; Smola, AJ; Song, L; Yu, PS; Yan, X; Borgwardt, KM

doi:10.1137/1.9781611972795.92

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Conference Paper

Near-optimal supervised feature selection among frequent subgraphs

MPS-Authors

/persons/resource/persons83946

Gretton, A
Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society;
Max Planck Institute for Biological Cybernetics, Max Planck Society;

/persons/resource/persons75313

Borgwardt, KM
Max Planck Institute for Biological Cybernetics, Max Planck Society;
Former Research Group Machine Learning and Computational Biology, Max Planck Institute for Biological Cybernetics, Max Planck Society;

External Resource

https://epubs.siam.org/doi/pdf/10.1137/1.9781611972795.92
(Publisher version)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Thoma, M., Cheng, H., Gretton, A., Han, J., Kriegel, H.-P., Smola, A., et al. (2009). Near-optimal supervised feature selection among frequent subgraphs. In H. Park, S. Parthasarathy, & H. Liu (Eds.), 9th SIAM Conference on Data Mining (SDM 2009) (pp. 1076-1087). Society for Industrial and Applied Mathematics: Philadelphia, PA, USA.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0013-C4FD-2

Abstract

Graph classification is an increasingly important step in numerous application domains, such as function prediction
of molecules and proteins, computerised scene analysis, and
anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key
challenge for graph mining. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and
help to prune the search space for discriminative frequent
subgraphs even during frequent subgraph mining.