nexaai-unreleased/restore_pile_mixed_2048k
收藏Hugging Face2024-08-22 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/nexaai-unreleased/restore_pile_mixed_2048k
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从Pile数据集中处理得到的,包含了多个子集,每个子集在数据集中的比例不同。主要子集包括Books3、ArXiv、Wikipedia (en)、FreeLaw、OpenWebText2、PubMed Abstracts、StackExchange、PubMed Central、BookCorpus2、Enron Emails、Gutenberg (PG-19)、HackerNews、NIH ExPorter、OpenSubtitles、PhilPapers和YoutubeSubtitles。数据集包含两个主要特征:text和context,并且有一个训练集分割。
This dataset is processed from the Pile dataset with the following subsets and ratio: Books3 (24.0%), ArXiv (8.83%), Wikipedia (en) (8.83%), FreeLaw (8.33%), OpenWebText2 (8.33%), PubMed Abstracts (8.33%), StackExchange (8.33%), PubMed Central (8.33%), BookCorpus2 (2.08%), Enron Emails (2.08%), Gutenberg (PG-19) (2.08%), HackerNews (2.08%), NIH ExPorter (2.08%), OpenSubtitles (2.08%), PhilPapers (2.08%), YoutubeSubtitles (2.08%).
提供机构:
nexaai-unreleased



