yasalma/tt-crawl
收藏Hugging Face2024-05-10 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yasalma/tt-crawl
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language:
- tt
license: apache-2.0
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
task_categories:
- text-generation
- fill-mask
task_ids:
- language-modeling
- masked-language-modeling
pretty_name: TatarCrawl
configs:
- config_name: default
data_files:
- split: news_noisy
path: train/news_noisy_*
- split: news_clean
path: train/news_clean_*
tags:
- tt
- crawl
- news
---
### Dataset Summary
In an effort to democratize research on low-resource languages, we release TatarCrawl dataset, a web news corpus consisting of materials from nearly 15 unique sources in the Tatar Language.
To load and use dataset, run this script:
```python
from datasets import load_dataset
tt_crawl=load_dataset("neurotatarlar/tt-crawl")
```
提供机构:
yasalma



