laion/COREX-18
收藏Hugging Face2024-09-14 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/laion/COREX-18
下载链接
链接失效反馈官方服务:
资源简介:
COREX-18数据集是基于2018年CORE数据集构建的综合性数据集,旨在为研究社区提供开放获取的科学论文,并促进先进的RAG应用和人工智能研究。该数据集包含超过8500万行数据,保留了关键的研究论文元数据,如标题、作者、发表日期、摘要等,但未包含所有元数据以避免复杂性和NULL值问题。数据集未对摘要和标题进行文本清理,保持了原始数据的完整性。COREX-18主要面向RAG应用和科学知识引用领域,支持多种NLP任务,并涵盖了化学、生物、法律、金融、音乐、艺术和气候等多个学科领域。
COREX-18 is a comprehensive dataset derived from the 2018 version of the CORE dataset, containing over 85 million rows. Its goal is to support advanced RAG applications and enhance artificial intelligence research. Due to the complexity of metadata and a high number of NULL values, only critical metadata was preserved. The dataset remains unaltered from the original CORE dataset, maintaining the integrity of titles and abstracts without any textual cleaning processes. It is primarily targeted towards RAG Applications and the Citing data and scientific knowledge category.
提供机构:
laion



