hide
Free keywords:
-
Abstract:
Automatic classification of data items, based on training samples,
can be boosted by considering the neighborhood of data items in
a graph structure (e.g., neighboring documents in a hyperlink environment
or co-authors and their publications for bibliographic data entries).
This paper presents a new method for graph-based classification,
with particular emphasis on hyperlinked text documents but broader
applicability. Our approach is based on iterative relaxation labeling and can
be
combined with either Bayesian or SVM classifiers on the feature spaces
of the given data items. The graph neighborhood is taken into consideration
to exploit locality patterns while at the same time avoiding overfitting.
In contrast to prior work along these lines, our approach employs a number
of novel techniques: dynamically inferring the link/class pattern in the graph
in the run of the iterative relaxation labeling,
judicious pruning of edges from the neighborhood graph
based on node dissimilarities and node degrees, weighting the influence of
edges based on
a distance metric between the classification labels of interest
and weighting edges by content similarity measures. Our techniques considerably
improve the robustness and accuracy of the classification outcome, as shown in
systematic experimental comparisons with previously published methods on three
different real-world datasets.