five

FluoroBench: A reconciled solvent-conditioned fluorophore benchmark for photophysical property prediction

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19905047
下载链接
链接失效反馈
官方服务:
资源简介:
FluoroBench is a reconciled, solvent-conditioned fluorophore benchmark for photophysical property prediction. It aggregates 66,855 fluorophore-solvent pairs from three public sources (FluoDB, Deep4Chem, nablaColors-3D); 64,107 pairs carry at least one of four official regression targets: fluorescence quantum yield (qy), absorption maximum (lambda_abs), emission maximum (lambda_em), and log molar extinction coefficient (log_eps). Each row is keyed by a deterministic pair_id over the canonical chromophore and solvent strings. Data layers: The data archive contains the reconciled master pair table, the official benchmark subset, auxiliary rows, source provenance, missing-target flags, solvent normalization outputs, conflict audits, Catalán solvent descriptors with coverage metadata, split membership, and difficulty labels. FluoroBench-PMC is included as a separate diagnostic supplement sourced from PubMed Central open-access papers and is kept separate from the official public-source benchmark population. Evaluation contract: FluoroBench provides three official split families: random row splits, Bemis-Murcko scaffold holdout, and solvent holdout. Each target and split family is released across five seeds, yielding 60 train/validation/test partitions. Comparisons are defined on matched pair_id test intersections with paired-bootstrap resampling; this makes the evaluated denominator explicit for every target, split, seed, and method pair. Descriptor coverage is reported as method coverage and does not define the split files. Deposit layout: This Zenodo record contains four files: a hash-locked data archive, a Track A results sidecar, a Croissant metadata sidecar, and a reproducibility code snapshot. The data archive contains the benchmark tables, split files, provenance, and audit outputs. The results sidecar contains official per-pair prediction files, aggregate leaderboard tables, paired-bootstrap matrices, benchmark denominator summaries, and strict-mask scaffold sensitivity files. The code snapshot contains verification and audit scripts ported to relative paths for reviewer-side reproduction. Metadata and license: The Croissant JSON-LD sidecar describes the archive-level files and their sha256 anchors. Data are released under CC-BY-4.0 with upstream attributions documented in LICENSE_AUDIT.md; verification scripts are released under MIT. Reviewer checks: Reviewers can verify the data archive against its release manifest, verify the Track A results sidecar against its results manifest, and rerun the scaffold-leakage audit from the extracted data and code snapshot. These checks are intended to make the benchmark population, split contract, and paper-facing prediction evidence auditable without accessing developer cache directories.
提供机构:
Zenodo
创建时间:
2026-05-06
二维码
社区交流群
二维码
科研交流群
商业服务