de.mpg.escidoc.pubman.appbase.FacesBean
Deutsch
 
Hilfe Wegweiser Impressum Kontakt Einloggen
  DetailsucheBrowse

Datensatz

DATENSATZ AKTIONENEXPORT

Freigegeben

Hochschulschrift

Leveraging Independence and Locality for Random Forests in a Distributed Environment

MPG-Autoren
http://pubman.mpdl.mpg.de/cone/persons/resource/persons44111

Belet,  Razvan
Databases and Information Systems, MPI for Informatics, Max Planck Society;
International Max Planck Research School, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons45720

Weikum,  Gerhard
Databases and Information Systems, MPI for Informatics, Max Planck Society;

http://pubman.mpdl.mpg.de/cone/persons/resource/persons45380

Schenkel,  Ralf
Databases and Information Systems, MPI for Informatics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar
Zitation

Belet, R. (2013). Leveraging Independence and Locality for Random Forests in a Distributed Environment. Master Thesis, Universität des Saarlandes, Saarbrücken.


Zitierlink: http://hdl.handle.net/11858/00-001M-0000-0024-97B8-0
Zusammenfassung
With the emergence of big data, inducting regression trees on very large data sets became a common data mining task. Even though centralized algorithms for computing ensembles of Classification/Regression trees are a well studied machine learning/data mining problem, their distributed versions still raise scalability, efficiency and accuracy issues. Most state of the art tree learning algorithms require data to reside in memory on a single machine. Adopting this approach for trees on big data is not feasible as the limited resources provided by only one machine lead to scalability problems. While more scalable implementations of tree learning algorithms have been proposed, they typically require specialized parallel computing architectures rendering those algorithms complex and error-prone. In this thesis we will introduce two approaches to computing ensembles of regression trees on very large training data sets using the MapReduce framework as an underlying tool. The first approach employs the entire MapReduce cluster to parallely and fully distributedly learn tree ensembles. The second approach exploits locality and independence in the tree learning process.