climb-mao/Bulgarian-BabyLM
收藏Hugging Face2025-05-18 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/climb-mao/Bulgarian-BabyLM
下载链接
链接失效反馈官方服务:
资源简介:
Bulgarian BabyLM数据集是由Mila Marcheva整理的保加利亚儿童文本句子级别语料库,包含28,467,275个token(不含标点符号)。每个条目包括原始句子文本、分词后的句子列表、句子来源的URL和句子的token数量。
The Bulgarian BabyLM Dataset is a sentence-level corpus of Bulgarian childrens text curated by Mila Marcheva, containing 28,467,275 tokens (excluding punctuation). Each entry includes the raw sentence text, a list of tokenized sentences, the URL of the source, and the token count of the sentence.
提供机构:
climb-mao



