Near-optimal supervised feature selection among frequent subgraphs

Thoma, M; Cheng H, Gretton, A; Han J, Kriegel H-P, Smola AJ, Song L, Yu PS, Yan, X; Borgwardt, KM; Park,; H.,; Parthasarathy, S.; Liu, H.

DetailsÜbersicht

Near-optimal supervised feature selection among frequent subgraphs

Thoma, M., Cheng H, Gretton, A., Han J, Kriegel H-P, Smola AJ, Song L, Yu PS, Yan, X., & Borgwardt, K. (2009). Near-optimal supervised feature selection among frequent subgraphs. In 9th SIAM Conference on Data Mining (SDM 2009) (pp. 1076-1087). Society for Industrial and Applied Mathematics: Philadelphia, PA, USA.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-C4FD-2 Versions-Permalink: https://hdl.handle.net/11858/00-001M-0000-0013-C4FE-F

Genre: Konferenzbeitrag

ausblenden:

Urheber:
Thoma, M, Autor
Cheng H, Gretton, A¹, Autor
Han J, Kriegel H-P, Smola AJ, Song L, Yu PS, Yan, X, Autor
Borgwardt, KM², Autor
Park, Herausgeber
H., Herausgeber
Parthasarathy, S., Herausgeber
Liu, H., Herausgeber

Affiliations:
1Department Empirical Inference, Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497795
2Max Planck Institute for Biological Cybernetics, Max Planck Society, ou_1497794

Inhalt

einblenden:

ausblenden:

Schlagwörter: -

Zusammenfassung: Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them useful for graph classification. Efficient and discriminative feature selection among frequent subgraphs is hence a key challenge for graph mining. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining.

Details

einblenden:

ausblenden:

Sprache(n):

Datum: Erschienen: 2009-05

Publikationsstatus: Erschienen

Seiten: -

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: ISBN: 978-1-615-67109-0
URI: http://www.siam.org/proceedings/datamining/2009/dm09_099_thomam.pdf
BibTex Citekey: 5666

Art des Abschluß: -

Veranstaltung

einblenden:

ausblenden:

Titel: 9th SIAM Conference on Data Mining (SDM 2009)

Veranstaltungsort: Sparks, NV, USA

Start-/Enddatum: -

Entscheidung

einblenden:

Projektinformation

einblenden:

Quelle 1

einblenden:

ausblenden:

Titel: 9th SIAM Conference on Data Mining (SDM 2009)

Genre der Quelle: Konferenzband

Urheber:

Affiliations:

Ort, Verlag, Ausgabe: Society for Industrial and Applied Mathematics : Philadelphia, PA, USA

Seiten: - Band / Heft: - Artikelnummer: - Start- / Endseite: 1076 - 1087 Identifikator: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle 1