English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Leveraging Independence and Locality for Random Forests in a Distributed Environment

Belet, R. (2013). Leveraging Independence and Locality for Random Forests in a Distributed Environment. Master Thesis, Universität des Saarlandes, Saarbrücken.

Item is

Files

show Files
hide Files
:
2013_ MSc Thesis_Razvan_Belet.pdf (Any fulltext), 2MB
 
File Permalink:
-
Name:
2013_ MSc Thesis_Razvan_Belet.pdf
Description:
-
OA-Status:
Visibility:
Restricted (Max Planck Institute for Informatics, MSIN; )
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Belet, Razvan1, 2, Author           
Weikum, Gerhard1, Advisor           
Schenkel, Ralf1, Referee           
Affiliations:
1Databases and Information Systems, MPI for Informatics, Max Planck Society, ou_24018              
2International Max Planck Research School, MPI for Informatics, Max Planck Society, Campus E1 4, 66123 Saarbrücken, DE, ou_1116551              

Content

show
hide
Free keywords: -
 Abstract: With the emergence of big data, inducting regression trees on very large data sets became a common data mining task. Even though centralized algorithms for computing ensembles of Classification/Regression trees are a well studied machine learning/data mining problem, their distributed versions still raise scalability, efficiency and accuracy issues. Most state of the art tree learning algorithms require data to reside in memory on a single machine. Adopting this approach for trees on big data is not feasible as the limited resources provided by only one machine lead to scalability problems. While more scalable implementations of tree learning algorithms have been proposed, they typically require specialized parallel computing architectures rendering those algorithms complex and error-prone. In this thesis we will introduce two approaches to computing ensembles of regression trees on very large training data sets using the MapReduce framework as an underlying tool. The first approach employs the entire MapReduce cluster to parallely and fully distributedly learn tree ensembles. The second approach exploits locality and independence in the tree learning process.

Details

show
hide
Language(s): eng - English
 Dates: 2013
 Publication Status: Issued
 Pages: 132 p.
 Publishing info: Saarbrücken : Universität des Saarlandes
 Table of Contents: -
 Rev. Type: -
 Identifiers: BibTex Citekey: Belet2013
 Degree: Master

Event

show

Legal Case

show

Project information

show

Source

show