English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  Aggregation of Multiple Clusterings and Active Learning in a Transductive Setting

Arvanitopoulos-Darginis, N. (2012). Aggregation of Multiple Clusterings and Active Learning in a Transductive Setting. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is

Files

show Files
hide Files
:
2011_Nikolaos_Darginis_Arvanitopoulos.pdf (Any fulltext), 548KB
 
File Permalink:
-
Name:
2011_Nikolaos_Darginis_Arvanitopoulos.pdf
Description:
-
OA-Status:
Visibility:
Restricted (Max Planck Institute for Informatics, MSIN; )
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Arvanitopoulos-Darginis, Nikolaos1, Author           
Hein, Matthias2, Advisor
Weikert, Joachim2, Referee
Affiliations:
1International Max Planck Research School, MPI for Informatics, Max Planck Society, ou_1116551              
2External Organizations, ou_persistent22              

Content

show
hide
Free keywords: -
 Abstract: In this work we proposed a novel transductive method to solve the problem of learning from partially labeled data. Our main idea was to aggregate information obtained from several clusterings to infer the labels of the unlabeled data. While our method is not restricted to a specific clustering method, we chose to use in our experiments the normalized variant of 1-spectral clustering, which was demonstrated to produce in most cases better clusterings than the standard spectral clustering method. Our approach yielded results which were at least comparable to, and in some cases even significantly better than the best results obtained by state-of-the-art methods reported in the literature. Furthermore, we proposed a novel active learning framework that is able to query the labels of the most informative points which help in the classification of the unlabeled points. For the majority vote scheme we provided some guarantees on the number of points that should be drawn from each cluster in order to infer the correct label of the cluster with high probability. Moreover, in the ridge regression scheme we proposed an algorithm that in each step selects the most uncertain point in terms of the prediction function of the classier (the point that lies near the decision boundary of the classifier). In both cases, experimental results show the strength of our methods and confirm our theoretical guarantees. The results look very promising and open several interesting directions of future research. For the SSL scheme, it is interesting to test the performance of several other clustering approaches, such as k-means, standard spectral clustering, hierarchical clustering, e.t.c. and combine them together in one general method. Our intuition is that the algorithm should be able to select only the good clusterings that provide discriminative information for each specific problem. Apart from ridge regression, it would be beneficial to experiment with other fitting approaches that produce sparse representations in our constructed basis. For the active learning framework, one interesting direction is to further generalize it into more general clusterings that take into account the hierarchical structure of data. In that way, we will take advantage of the underlying hierarchy and by adaptively selecting the pruning of the cluster tree we can (potentially) further improve our sampling strategy. Additionally, we believe that in the multi-clustering scenario extensive improvements of our algorithm can be proposed in order to better take advantage of the variation in the multiple clustering representations of the data. Finally, as our methods scale to large-scale problems and partially labeled data occurs in many different areas ranging from web documents to protein data, there is room for many interesting applications of the proposed methods.

Details

show
hide
Language(s): eng - English
 Dates: 20122012
 Publication Status: Issued
 Pages: -
 Publishing info: Saarbrücken : Universität des Saarlandes
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: Arvanitopoulos-Darginis2011
 Degree: Master

Event

show

Legal Case

show

Project information

show

Source

show