TabPFN Opens New Avenues for Small-Data Tabular Learning in Drug Discovery
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/TabPFN_Opens_New_Avenues_for_Small-Data_Tabular_Learning_in_Drug_Discovery/31833157
下载链接
链接失效反馈官方服务:
资源简介:
Early-stage drug discovery often suffers from data scarcity
and
out-of-distribution (OOD) shifts, which constrain the reliability
of predictive models. While deep learning has advanced representation
learning from molecular and biological data, tabular modeling remains
indispensable, particularly in small-sample and OOD scenarios. For
more than a decade, gradient-boosted decision trees (GBDTs), such
as XGBoost, have been the dominant choice, yet their robustness is
limited under such conditions. TabPFN, a recently introduced transformer-based
tabular foundation model, enables accurate predictions on small data
sets without task-specific retraining. Applying TabPFN to a variety
of molecular data sets, we find that TabPFN performs on par with XGBoost
in classification but demonstrates clear and stable advantages in
regression, with its strongest gains on small and medium data sets
and under OOD evaluations. Feature and data ablations (10–90%)
further highlight its robustness, as performance degrades gracefully
and exhibits minimal sensitivity compared with tree ensembles. On
quantum tasks, TabPFN shows competitive accuracy on QM7 but is challenged
by the larger QM8 data set, where tree ensembles regain strength.
Beyond metrics, embedding analyses indicate smoother structure–property
relationships of TabPFN and enhanced class separability, reflecting
beneficial inductive biases rather than overfitting. Collectively,
these findings demonstrate that TabPFN offers a robust and data-efficient
alternative for tabular learning in drug discovery, shedding new light
on predictive modeling under small-data and OOD challenges.
创建时间:
2026-03-23



