Multi-task learning for pKa prediction

Skolidis, Grigorios; Hansen, Katja; Sanguinetti, Guido; Rupp, Matthias

doi:10.1007/s10822-012-9582-x

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Journal Article

Multi-task learning for pK_a prediction

MPS-Authors

/persons/resource/persons45970

Hansen, Katja
Theory, Fritz Haber Institute, Max Planck Society;
Machine Learning Group, TU Berlin;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Skolidis, G., Hansen, K., Sanguinetti, G., & Rupp, M. (2012). Multi-task learning for pK_a prediction. Journal of Computer-Aided Molecular Design, 26(7), 883-895. doi:10.1007/s10822-012-9582-x.

Cite as: https://hdl.handle.net/11858/00-001M-0000-0010-76DA-7

Abstract

Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multitask models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.