Seed42Lab/AI-paper-crawl
收藏Hugging Face2024-11-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Seed42Lab/AI-paper-crawl
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为AI-paper-crawl,包含11个分割,每个分割对应一个会议。每个分割包含四个字段:index(索引号)、text(论文的纯文本内容)、year(论文的发表年份,字符串格式)、No(年份内的索引号,字符串格式)。特别提到ICLR分割可能缺少约20%-25%的论文,因为这些论文是通过在arxiv上搜索收集的,可能返回0个或多个结果。
The dataset contains 11 splits, corresponding to 11 conferences. Each split includes four fields: index (a primary key starting from 0), text (the content of the paper in plain text form, with newline characters converted to 3 spaces unless - is detected), year (the publication year of the paper as a string, such as 2018, which can be converted to an integer if needed), and No (a string-formatted index number within a year, 1-indexed. In the ECCV split, No is the index number throughout the entire split, providing only a reference of the order these papers were accessed, not the actual publication order). The ICLR split may be missing approximately 20%-25% of the papers, as it is collected by searching on arxiv, which may return 0 or more than 1 result.
提供机构:
Seed42Lab



