gbyuvd/chemq3-molsim-sft-smiles
收藏Hugging Face2025-10-23 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/gbyuvd/chemq3-molsim-sft-smiles
下载链接
链接失效反馈官方服务:
资源简介:
ECFP4分子对数据集,包含具有统一目标范围内ECFP4 Dice相似度评分的分子对,使用FAISS进行高效相似度搜索。该数据集通过特定的预处理、指纹计算、索引构建和分子对采样步骤,生成一个高质量、化学多样性平衡、计算效率高且目标相似度分布均匀的分子对集合,适用于监督微调(SFT)和句子变换器训练,以学习有意义但非平凡的分子相似性。
A dataset of molecular pairs with ECFP4 Dice similarity scores uniformly sampled across a target range, using FAISS for efficient similarity search. This dataset is produced through specific preprocessing, fingerprinting, indexing, and pair sampling steps to generate a high-quality, balanced set of molecular pairs for similarity-based learning, targeting meaningful but non-trivial molecular similarities.
提供机构:
gbyuvd



