five

upb-nlp/lemi_lexical_lists

收藏
Hugging Face2026-03-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/upb-nlp/lemi_lexical_lists
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification language: - ro pretty_name: Romanian Grade Word Lists size_categories: - 1K<n<10K --- # Romanian Grade Word Lists ## Dataset Description This dataset contains word lists grouped by grade level (1–4), derived from model-based difficulty predictions across multiple Romanian texts. Each word is associated with a mean grade score indicating its predicted difficulty level. ## Dataset Structure Each file corresponds to a grade level and contains the following columns: - `cuvant`: the Romanian word (token) - `grad_mediu`: mean predicted grade level for that word ### Files | File | Grade | Rows | |---|---|---| | clasa_1.csv | Grade 1 | 534 | | clasa_2.csv | Grade 2 | 2,989 | | clasa_3.csv | Grade 3 | 2,768 | | clasa_4.csv | Grade 4 | 3,545 | ## Data Usage The dataset can be used for: - lexical simplification - educational content adaptation - readability analysis in Romanian - NLP tasks involving grade-level classification ## Example ```python from datasets import load_dataset dataset = load_dataset("your-username/grade-wordlists") print(dataset) ```
提供机构:
upb-nlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作