arXiv Scientific Research Paper Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/mm6kst3krj
下载链接
链接失效反馈官方服务:
资源简介:
Description
This dataset comprises structured metadata from the arXiv repository, a widely used preprint server for scientific research. It includes paper titles, abstracts, categories (subject areas), and submission dates, making it a valuable resource for research in natural language processing (NLP), bibliometrics, machine learning, and scientific trend analysis.
Content
The dataset contains the following columns
1. id: Unique arXiv identifier for each paper.
2. title: The title of the research paper.
3. summary: Summary of the paper’s content, extracted from arXiv.
4. summary_word_count: Word count of the summary.
5. category: Subject categories assigned by arXiv.
6. category code: Category code for the research paper.
7. published_date: Publication date of the research paper.
8. updated_date: The last updated date is when the paper is updated.
9. authors: Authors of the research paper.
10. first_author: First Author mentioned in the paper.
创建时间:
2025-02-19



