crklih/turkish-cefr-phrases
收藏Hugging Face2026-03-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/crklih/turkish-cefr-phrases
下载链接
链接失效反馈官方服务:
资源简介:
# Turkish CEFR Phrases
A dataset of Turkish phrases extracted from YouTube videos, labeled with CEFR proficiency levels (A1–C2).
## Dataset Details
- **Language:** Turkish
- **Size:** 174,526 phrases
- **Labels:** A1, A2, B1, B2, C1, C2
## Label Distribution
| Level | Count | % |
|-------|---------|-------|
| A2 | 70,626 | 40.5% |
| B1 | 57,758 | 33.1% |
| B2 | 34,728 | 19.9% |
| A1 | 7,016 | 4.0% |
| C1 | 4,304 | 2.5% |
| C2 | 94 | 0.05% |
## Data Fields
- `phrase_text` — Turkish phrase
- `cefr_level` — CEFR level label (A1 to C2)
- `cefr_confidence` — Model confidence score (avg: 0.895)
## Labeling Model
Labels were generated using [crklih/turkish-cefr-classifier](https://huggingface.co/crklih/turkish-cefr-classifier).
## Source
Phrases are short excerpts from publicly available Turkish video content on the internet. This dataset is intended for linguistic research and language learning purposes only.
## License
[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
提供机构:
crklih



