five

AliMaatouk/arXiv_Topics

收藏
Hugging Face2025-02-28 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AliMaatouk/arXiv_Topics
下载链接
链接失效反馈
官方服务:
资源简介:
arXiv话题数据集为arXiv论文提供了一个结构化的映射,将论文映射到三个不同抽象级别的话题类别。这些话题分类是通过提示GPT-4o生成的,确保了从广泛领域到高度具体研究领域的层次化分类。数据集包含2,422,486个论文ID,每个ID都分配了跨以下级别的话题:第一级(广泛领域),如计算机科学、数学、物理学等;第二级(中间类别),如语言学、量子计算、理论机器学习等;第三级(具体研究话题),如大型语言模型、神经网络优化、小样本学习等。这个数据集可以用于文档分类、话题建模、检索增强以及其他AI驱动的文献应用。

The arXiv Topics Dataset provides a structured mapping of arXiv papers to topic categories at three different levels of abstraction. These topic classifications were generated by prompting GPT-4o, ensuring a hierarchical categorization from broad fields to highly specific research areas. The dataset consists of 2,422,486 paper IDs, each assigned topics across Level 1 (Broad Domains) such as Computer Science, Mathematics, Physics, etc., Level 2 (Intermediate Categories) like Linguistics, Quantum Computing, Theoretical Machine Learning, etc., and Level 3 (Specific Research Topics) such as Large Language Models, Neural Network Optimization, Few-Shot Learning, etc. This dataset can be used for document classification, topic modeling, retrieval augmentation, and other AI-driven literature applications.
提供机构:
AliMaatouk
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作