timaeus/pile-ubuntu_irc-broken
收藏Hugging Face2025-04-21 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/timaeus/pile-ubuntu_irc-broken
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个从monology/pile-uncopyrighted数据集中筛选出的子集,包含文本数据和元数据信息,元数据中包括数据集的名称。训练集包含大约15,345个示例,总大小为9.05GB。数据集适用于语言模型训练,但由于其体积庞大,加载时可能会耗尽内存。
This dataset is a filtered subset from the monology/pile-uncopyrighted dataset, containing text data and metadata information, including the name of the dataset in the metadata. The training set comprises approximately 15,345 examples with a total size of 9.05GB. The dataset is suitable for language model training, but due to its large size, attempting to load it may耗尽 memory.
提供机构:
timaeus



