Hilfe Wegweiser Datenschutzhinweis Impressum Kontakt





Molecular diagnosis: classification, model selection, and performance evaluation


Markowetz,  Florian
Max Planck Society;

Spang,  Rainer
Dept. of Computational Molecular Biology (Head: Martin Vingron), Max Planck Institute for Molecular Genetics, Max Planck Society;

Externe Ressourcen
Es sind keine Externen Ressourcen verfügbar
Volltexte (frei zugänglich)
Es sind keine frei zugänglichen Volltexte verfügbar
Ergänzendes Material (frei zugänglich)
Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Markowetz, F., & Spang, R. (2005). Molecular diagnosis: classification, model selection, and performance evaluation. Methods of Information in Medicine, 44(3), 438-443.

OBJECTIVES: We discuss supervised classification techniques applied to medical diagnosis based on gene expression profiles. Our focus lies on strategies of adaptive model selection to avoid overfitting in high-dimensional spaces. METHODS: We introduce likelihood-based methods, classification trees, support vector machines and regularized binary regression. For regularization by dimension reduction, we describe feature selection methods: feature filtering, feature shrinkage and wrapper approaches. In small sample-size situations efficient methods of data re-use are needed to assess the predictive power of a model. We discuss two issues in using cross-validation: the difference between in-loop and out-of-loop feature selection, and estimating model parameters in nested-loop cross-validation. RESULTS: Gene selection does not reduce the dimensionality of the model. Tuning parameters enable adaptive model selection. The feature selection bias is a common pitfall in performance evaluation. Model selection and performance evaluation can be combined by nested-loop cross-validation. CONCLUSIONS: Classification of microarrays is prone to overfitting. A rigorous and unbiased assessment of the predictive power of the model is a must.