BabyLM-community/babylm-nld
收藏Hugging Face2025-06-18 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/BabyLM-community/babylm-nld
下载链接
链接失效反馈官方服务:
资源简介:
BabyLM数据集是一个多语种集合的一部分,专注于荷兰语内容。该数据集包含了251224个文档和29346372个总token,分为儿童书籍、儿童新闻、教育和字幕等类别。每个条目都包括文本内容、内容类型、数据源、书写系统、目标年龄或年龄范围、数据许可和附加元数据等字段。
The BabyLM Dataset is part of the BabyLM multilingual collection, focusing on Dutch content. This dataset includes 251224 documents and a total of 29346372 tokens, categorized into child books, child news, educational, subtitles, etc. Each entry contains fields such as text content, content type, data source, writing system, target age or age range, data license, and additional metadata.
提供机构:
BabyLM-community



