TabPFN Opens New Avenues for Small-Data Tabular Learning in Drug Discovery
收藏Figshare2026-03-23 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/TabPFN_Opens_New_Avenues_for_Small-Data_Tabular_Learning_in_Drug_Discovery/31833160
下载链接
链接失效反馈官方服务:
资源简介:
Early-stage drug discovery often suffers from data scarcity and out-of-distribution (OOD) shifts, which constrain the reliability of predictive models. While deep learning has advanced representation learning from molecular and biological data, tabular modeling remains indispensable, particularly in small-sample and OOD scenarios. For more than a decade, gradient-boosted decision trees (GBDTs), such as XGBoost, have been the dominant choice, yet their robustness is limited under such conditions. TabPFN, a recently introduced transformer-based tabular foundation model, enables accurate predictions on small data sets without task-specific retraining. Applying TabPFN to a variety of molecular data sets, we find that TabPFN performs on par with XGBoost in classification but demonstrates clear and stable advantages in regression, with its strongest gains on small and medium data sets and under OOD evaluations. Feature and data ablations (10–90%) further highlight its robustness, as performance degrades gracefully and exhibits minimal sensitivity compared with tree ensembles. On quantum tasks, TabPFN shows competitive accuracy on QM7 but is challenged by the larger QM8 data set, where tree ensembles regain strength. Beyond metrics, embedding analyses indicate smoother structure–property relationships of TabPFN and enhanced class separability, reflecting beneficial inductive biases rather than overfitting. Collectively, these findings demonstrate that TabPFN offers a robust and data-efficient alternative for tabular learning in drug discovery, shedding new light on predictive modeling under small-data and OOD challenges.
创建时间:
2026-03-23



