aluncstokes/mathpile_arxiv_subset_tiny
收藏Hugging Face2024-02-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/aluncstokes/mathpile_arxiv_subset_tiny
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: "train_chunked.jsonl"
- split: test
path: "test_chunked.jsonl"
---
# MathPile ArXiv (subset)
## Description
This dataset consists of a toy subset of 8834 (5000 training + 3834 testing) TeX files found in the arXiv subset of MathPile, used for testing. You should not use this dataset. Training and testing sets are already split
## Source
The data was obtained from the training + validation portion of the arXiv subset of MathPile.
## Format
- Given as JSONL files of JSON dicts each containing the single key: "text"
## Usage
- LaTeX stuff idk
## License
The original data is subject to the licensing terms of the arXiv. Users should refer to the arXiv's terms of use for details on permissible usage.
提供机构:
aluncstokes
原始信息汇总
MathPile ArXiv (subset)
描述
该数据集包含从MathPile的arXiv子集中提取的8834个(5000个训练 + 3834个测试)TeX文件的玩具子集,用于测试。不应使用此数据集。训练和测试集已预先划分。
来源
数据来自MathPile的arXiv子集的训练和验证部分。
格式
- 以JSONL文件形式提供,每个文件包含一个键为"text"的JSON字典。
许可
原始数据受arXiv许可条款的约束。用户应参考arXiv的使用条款以了解允许的使用方式。



