ccSVM: correcting Support Vector Machines for confounding factors in biological 
data classification

Li, L.; Rakitsch, B.; Borgwardt, K.

doi:10.1093/bioinformatics/btr204

アイテム詳細

登録内容を編集ファイル形式で保存

一時保存へ追加

タグ情報を表示リリース履歴を表示詳細要約

公開

学術論文

ccSVM: correcting Support Vector Machines for confounding factors in biological data classification

MPS-Authors

/persons/resource/persons75769

Li, L.
Stuttgart Center for Electron Microscopy, Max Planck Institute for Intelligent Systems, Max Planck Society;

Rakitsch, B.
Max Planck Society;

/persons/resource/persons75313

Borgwardt, K.
Research Group Machine Learning and Computational Biology, Max Planck Institute for Intelligent Systems, Max Planck Society;
Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;

External Resource

There are no locators available

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

フルテキスト (公開)

公開されているフルテキストはありません

付随資料 (公開)

There is no public supplementary material available

引用

Li, L., Rakitsch, B., & Borgwardt, K. (2011). ccSVM: correcting Support Vector Machines for confounding factors in biological data classification. ISMB/ECCB 2011, pp. i342-i348. doi:10.1093/bioinformatics/btr204.

引用: https://hdl.handle.net/11858/00-001M-0000-0010-4C8F-9

要旨

Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach for classifying biological data, due to their high accuracy, their ability to deal with structured data such as strings, and the ease to integrate various types of data. However, it is unclear how to correct for confounding factors such as population structure, age or gender or experimental conditions in Support Vector Machine classification. Results: In this article, we present a Support Vector Machine classifier that can correct the prediction for observed confounding factors. This is achieved by minimizing the statistical dependence between the classifier and the confounding factors. We prove that this formulation can be transformed into a standard Support Vector Machine with rescaled input data. In our experiments, our confounder correcting SVM (ccSVM) improves tumor diagnosis based on samples from different labs, tuberculosis diagnosis in patients of varying age, ethnicity and gender, and phenotype prediction in the presence of population structure and outperforms state-of-the-art methods in terms of prediction accuracy.