five

arXiv Abstracts

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/gfissore/arxiv-abstracts-2021/blob/main/README.md
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了大约200万份截至2021年底的arXiv摘要,经过与UAT术语的相关性筛选后,保留了168,084份相关摘要。筛选过程中使用了Yake!工具进行关键词提取,以确保摘要与UAT术语的语义相似性。规模上,该数据集大约有200万份摘要,其中168,084份与UAT术语相关。此外,该数据集的任务涉及数据集搜索和知识图谱生成。

This dataset contains approximately 2 million arXiv abstracts as of the end of 2021. Following relevance screening against UAT terminology, 168,084 relevant abstracts were retained. The YAKE! tool was utilized for keyword extraction during the screening workflow to validate the semantic similarity between the abstracts and UAT terminology. In terms of scale, this dataset includes roughly 2 million abstracts, with 168,084 of them relevant to UAT terminology. Additionally, the tasks associated with this dataset encompass dataset search and knowledge graph generation.
提供机构:
arXiv
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作