kanishka/babylm2-rewritten-clean-spacy_hierarchical-adj_211_age-origin_adj2-ablation
收藏Hugging Face2025-10-30 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/kanishka/babylm2-rewritten-clean-spacy_hierarchical-adj_211_age-origin_adj2-ablation
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含文本数据的集合,被划分为训练集和验证集,适用于机器学习模型的训练和验证。训练集包含大约1200万个样本,验证集包含大约120万个样本。
The dataset is a collection of text data, split into training and validation sets, suitable for training and validation of machine learning models. The training set contains approximately 12 million samples, and the validation set contains approximately 1.2 million samples.
提供机构:
kanishka



