databio/cell-line-nli
收藏Hugging Face2025-02-09 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/databio/cell-line-nli
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个结构与AllNLI类似的数据集,用于微调嵌入模型,以便生成生物医学术语的有意义嵌入。它包含了来自cellosaurus的30001个人类细胞系的元数据,这些细胞系具有疾病、来源部位、转化物和细胞类型等注释。数据集分为三种配置:pair-class、triplet-terms和triplet-sentences,每种配置都有训练集、验证集和测试集。pair-class配置用于分类任务,而triplet-terms和triplet-sentences配置用于三元组任务。
This dataset is structured similarly to AllNLI and is intended for fine-tuning embedding models to generate meaningful embeddings of biomedical terms. It consists of metadata for 30,001 human cell lines from cellosaurus, with annotations such as diseases, derivation sites, transformants, and cell types. The dataset is divided into three configurations: pair-class, triplet-terms, and triplet-sentences, each with training, validation, and test splits. The pair-class configuration is used for classification tasks, while the triplet-terms and triplet-sentences configurations are used for triplet tasks.
提供机构:
databio



