haukur/babylm10
收藏Hugging Face2024-12-02 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/haukur/babylm10
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个主要特征:text(文本内容,数据类型为字符串)、id_in_domain(域内ID,数据类型为整型)和domain(域,数据类型为字符串)。数据集分为训练集(train)、开发集(dev)和测试集(test)三个部分,分别包含1,179,014、1,168,153和1,110,000个样本。总下载大小为121,594,971字节,数据集总大小为248,710,755字节。
The dataset includes three main features: text (text content, data type is string), id_in_domain (ID within the domain, data type is int64), and domain (domain, data type is string). The dataset is divided into three parts: training set (train), development set (dev), and test set (test), containing 1,179,014, 1,168,153, and 1,110,000 samples respectively. The total download size is 121,594,971 bytes, and the total dataset size is 248,710,755 bytes.
提供机构:
haukur



