ausblenden:
Schlagwörter:
-
Zusammenfassung:
Many compound properties depend directly on
the dissociation constants of its acidic and basic groups.
Significant effort has been invested in computational
models to predict these constants. For linear regression
models, compounds are often divided into chemically
motivated classes, with a separate model for each class.
However, sometimes too few measurements are available
for a class to build a reasonable model, e.g., when investigating
a new compound series. If data for related classes
are available, we show that multi-task learning can be used
to improve predictions by utilizing data from these other
classes. We investigate performance of linear Gaussian
process regression models (single task, pooling, and multitask
models) in the low sample size regime, using a published
data set (n = 698, mostly monoprotic, in aqueous
solution) divided beforehand into 15 classes. A multi-task
regression model using the intrinsic model of co-regionalization
and incomplete Cholesky decomposition performed
best in 85 % of all experiments. The presented
approach can be applied to estimate other molecular
properties where few measurements are available.