five

Trained checkpoints and preprocessed data for "Closing the gap on a $0 budget: ensembling public molecular foundation models for HIV bioactivity prediction"

收藏
DataCite Commons2026-05-03 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19946459
下载链接
链接失效反馈
官方服务:
资源简介:
Trained model checkpoints, normalization statistics, fitted ensemble stacker,and preprocessed graph cache supporting the HIV bioactivity predictionpreprint by Agarwal (2026). Contents:- best_molformer_fold{0..4}.pth: Five MolFormer-XL checkpoints fine-tuned  on MoleculeNet HIV scaffold-CV folds. Each ~170 MB.- best_gnn_fold{0..4}_v5_desc.pth: Five GATv2-based GNN ("v5b") checkpoints  trained from scratch on the same folds.- global_feature_stats_v5_desc_fold{0..4}.pt: Per-fold means/stds for the  RDKit global descriptors (z-score normalization).- ensemble_stacker.pt: Logistic stacker coefficients, three principled  decision thresholds (Youden's J / F1-max / base-rate), and raw out-of-  fold prediction arrays for n=24,391 molecules.- hiv_preprocessed_cache_v5_desc.pt: 41,119 RDKit-parsed molecules as  PyTorch Geometric Data objects with atom features (23-dim), bond features  (8-dim), global descriptors, and Bemis-Murcko scaffolds. Reproduces the  exact deterministic 5-fold scaffold split used in training. These artifacts reproduce the headline test AUC of 0.806 ± 0.018 on theMoleculeNet HIV scaffold-split benchmark. Source code is athttps://github.com/v659/HIV-drug-discovery. License: MIT (matches the source repository).
提供机构:
Zenodo
创建时间:
2026-05-01
二维码
社区交流群
二维码
科研交流群
商业服务