BachNgoH/ParsedArxivPapers
收藏Hugging Face2024-04-11 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/BachNgoH/ParsedArxivPapers
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: sections
list:
- name: figure_ref
sequence: string
- name: heading
dtype: string
- name: publication_ref
sequence: string
- name: table_ref
sequence: string
- name: text
dtype: string
- name: pub_date
dtype: string
- name: doi
dtype: string
- name: references
list:
- name: authors
dtype: string
- name: journal
dtype: string
- name: ref_id
dtype: string
- name: title
dtype: string
- name: year
dtype: string
- name: formulas
list:
- name: formula_coordinates
sequence: float64
- name: formula_id
dtype: string
- name: formula_text
dtype: string
- name: title
dtype: string
- name: abstract
dtype: string
- name: authors
dtype: string
- name: figures
list:
- name: figure_caption
dtype: string
- name: figure_data
dtype: string
- name: figure_id
dtype: string
- name: figure_label
dtype: string
- name: figure_type
dtype: string
- name: citation_data
dtype: string
splits:
- name: train
num_bytes: 1123054007
num_examples: 19454
download_size: 536920578
dataset_size: 1123054007
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
BachNgoH
原始信息汇总
数据集概述
数据集特征
- sections: 包含以下子特征:
figure_ref: 字符串序列heading: 字符串publication_ref: 字符串序列table_ref: 字符串序列text: 字符串
- pub_date: 字符串
- doi: 字符串
- references: 包含以下子特征:
authors: 字符串journal: 字符串ref_id: 字符串title: 字符串year: 字符串
- formulas: 包含以下子特征:
formula_coordinates: 浮点数序列formula_id: 字符串formula_text: 字符串
- title: 字符串
- abstract: 字符串
- authors: 字符串
- figures: 包含以下子特征:
figure_caption: 字符串figure_data: 字符串figure_id: 字符串figure_label: 字符串figure_type: 字符串
- citation_data: 字符串
数据集划分
- train:
- 数据量: 1123054007 字节
- 示例数: 19454
数据集大小
- 下载大小: 536920578 字节
- 数据集总大小: 1123054007 字节
配置
- default:
- 数据文件路径:
data/train-*
- 数据文件路径:



