ArmelR/sharded-pile
收藏Hugging Face2023-09-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ArmelR/sharded-pile
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: all
data_files:
- split: train
path:
- data/ArXiv/train/*.parquet
- data/BookCorpus2/train/*.parquet
- data/Books3/train/*.arrow
- data/DM Mathematics/train/*.parquet
- data/Enron Emails/train/*.parquet
- data/EuroParl/train/*.parquet
- data/FreeLaw/train/*.parquet
- data/Github/train/*.parquet
- data/Gutenberg (PG-19)/train/*.parquet
- data/HackerNews/train/*.parquet
- data/NIH ExPorter/train/*.parquet
- data/OpenSubtitles/train/*.parquet
- data/OpenWebText2/train/*.parquet
- data/PhilPapers/train/*.parquet
- data/Pile-CC/train/*.parquet
- data/PubMed Abstracts/train/*.parquet
- data/PubMed Central/train/*.parquet
- data/StackExchange/train/*.parquet
- data/UPSTO Backgrounds/train/*.parquet
- data/Ubuntu IRC/train/*.parquet
- data/Wikipedia (en)/train/*.parquet
- data/YoutubeSubtitles/train/*.parquet
default : true
---
提供机构:
ArmelR
原始信息汇总
数据集概述
数据集配置
- 配置名称: all
数据文件详情
- 分割类型: train
- 文件路径:
data/ArXiv/train/*.parquetdata/BookCorpus2/train/*.parquetdata/Books3/train/*.arrowdata/DM Mathematics/train/*.parquetdata/Enron Emails/train/*.parquetdata/EuroParl/train/*.parquetdata/FreeLaw/train/*.parquetdata/Github/train/*.parquetdata/Gutenberg (PG-19)/train/*.parquetdata/HackerNews/train/*.parquetdata/NIH ExPorter/train/*.parquetdata/OpenSubtitles/train/*.parquetdata/OpenWebText2/train/*.parquetdata/PhilPapers/train/*.parquetdata/Pile-CC/train/*.parquetdata/PubMed Abstracts/train/*.parquetdata/PubMed Central/train/*.parquetdata/StackExchange/train/*.parquetdata/UPSTO Backgrounds/train/*.parquetdata/Ubuntu IRC/train/*.parquetdata/Wikipedia (en)/train/*.parquetdata/YoutubeSubtitles/train/*.parquet
默认设置
- 默认: true



