Molecular diagnosis: classification, model selection, and performance evaluation

Markowetz, Florian; Spang, Rainer

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Molecular diagnosis: classification, model selection, and performance evaluation

MPS-Authors

Markowetz, Florian
Max Planck Society;

/persons/resource/persons50564

Spang, Rainer
Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Markowetz, F., & Spang, R. (2005). Molecular diagnosis: classification, model selection, and performance evaluation. Methods of Information in Medicine, 44(3), 438-443.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0010-872C-8

Abstract

OBJECTIVES: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces. METHODS: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation. RESULTS: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation. CONCLUSIONS: Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.