EMBO/soda-vec-data-full_pmc_title_abstract
收藏Hugging Face2025-07-31 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/EMBO/soda-vec-data-full_pmc_title_abstract
下载链接
链接失效反馈官方服务:
资源简介:
SODA-VEC Clean Dataset是一个经过清洗和过滤的SODA-VEC数据集,包含了来自PubMed Central (PMC)文章的高质量生物医学标题-摘要对。该数据集经过长度过滤等质量控制步骤,以确保数据的高质量。它适用于科学文本嵌入、生物医学自然语言处理、语义相似性学习和信息检索等应用。
The SODA-VEC Clean Dataset is a cleaned and filtered version of the SODA-VEC dataset, containing high-quality biomedical title-abstract pairs from PubMed Central (PMC) articles. The dataset has undergone quality control measures including length filtering to ensure high-quality data. It is suitable for applications such as scientific text embeddings, biomedical NLP, semantic similarity learning, and information retrieval.
提供机构:
EMBO



