upb-nlp/lemi_lexical_lists
收藏Hugging Face2026-03-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/upb-nlp/lemi_lexical_lists
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-classification
language:
- ro
pretty_name: Romanian Grade Word Lists
size_categories:
- 1K<n<10K
---
# Romanian Grade Word Lists
## Dataset Description
This dataset contains word lists grouped by grade level (1–4), derived from model-based difficulty predictions across multiple Romanian texts. Each word is associated with a mean grade score indicating its predicted difficulty level.
## Dataset Structure
Each file corresponds to a grade level and contains the following columns:
- `cuvant`: the Romanian word (token)
- `grad_mediu`: mean predicted grade level for that word
### Files
| File | Grade | Rows |
|---|---|---|
| clasa_1.csv | Grade 1 | 534 |
| clasa_2.csv | Grade 2 | 2,989 |
| clasa_3.csv | Grade 3 | 2,768 |
| clasa_4.csv | Grade 4 | 3,545 |
## Data Usage
The dataset can be used for:
- lexical simplification
- educational content adaptation
- readability analysis in Romanian
- NLP tasks involving grade-level classification
## Example
```python
from datasets import load_dataset
dataset = load_dataset("your-username/grade-wordlists")
print(dataset)
```
提供机构:
upb-nlp



