nlpscu/Beyond-Flesch
收藏Hugging Face2026-04-14 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/nlpscu/Beyond-Flesch
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-classification
language:
- en
tags:
- education
- readability
- difficulty-classification
- scienceqa
pretty_name: 'Beyond-Flesch: ScienceQA Difficulty Classification'
size_categories:
- 1K<n<10K
configs:
- config_name: texts
data_files:
- split: train
path: train.csv
- split: test
path: test.csv
- config_name: static_metrics
data_files:
- split: train
path: train_static_metrics.csv
- split: test
path: test_static_metrics.csv
- config_name: prompt_metrics_gemma_7b
data_files:
- split: train
path: train_prompt_metrics_gemma-7b.csv
- split: test
path: test_prompt_metrics_gemma-7b.csv
---
# Beyond-Flesch: ScienceQA Difficulty Classification with Static and Prompt-Based Metrics
A preprocessed subset of [ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) for K-12 educational text difficulty classification, along with the static and LLM-derived prompt-based features we use to reproduce Rooein et al. (2024) *Beyond Flesch-Kincaid*.
This dataset accompanies our class research project (Option 1: reproducing a paper whose original code was not released).
## What's here
| File | Rows | Description |
|------|------|-------------|
| `train.csv` | 3,638 | Training split, balanced across the 3 grade buckets |
| `test.csv` | 910 | Test split, balanced across the 3 grade buckets |
| `train_static_metrics.csv` | 3,638 | 46 static readability features per Appendix C of Rooein et al. (2024) |
| `test_static_metrics.csv` | 910 | Same, on test split |
| `train_prompt_metrics_gemma-7b.csv` | 3,638 | 63 prompt-based features computed with Gemma-7B-IT |
| `test_prompt_metrics_gemma-7b.csv` | 910 | Same, on test split |
Mistral-7B prompt metrics are at ~50% completion and will be added in the next release. Llama-2-7B and Llama-2-13B are queued for the next submission.
## How it was built
Following Section 4.1 of Rooein et al. (2024):
1. Loaded the full ScienceQA dataset (21,208 items) from `derek-thomas/ScienceQA`
2. Filtered out items with images
3. Collapsed the 12 K-12 grade levels into 3 buckets: elementary (1–5), middle (6–8), high (9–12)
4. Deduplicated on the combined `full_text` field
5. Sampled 1,516 items per bucket with random seed 42 → 4,548 balanced items
6. 80/20 stratified train/test split → 3,638 / 910
Static metrics computed with `textstat`, `nltk`, `spacy`, and WordNet (see Appendix C of the Rooein paper for the full list). Prompt-based metrics computed by querying Gemma-7B-IT (8-bit quantized via `bitsandbytes`) with each of the 63 prompts from Appendix A.
## Columns in `train.csv` / `test.csv`
`question`, `choices`, `solution`, `lecture`, `full_text`, `text_question`, `text_solution`, `text_lecture`, `education_level`, `grade`, `subject`, `topic`, `category`
The classification target is `education_level` (`elementary` / `middle` / `high`).
## Code
Full reproduction code: https://github.com/SCU-CSEN346/Beyond-Flesch
## Citation
If you use this dataset, please cite both the original ScienceQA paper and the Rooein paper we are reproducing:
```bibtex
@inproceedings{lu2022learn,
title={Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering},
author={Lu, Pan and Mishra, Swaroop and Xia, Tony and Qiu, Liang and Chang, Kai-Wei and Zhu, Song-Chun and Tafjord, Oyvind and Clark, Peter and Kalyan, Ashwin},
booktitle={NeurIPS},
year={2022}
}
@inproceedings{rooein2024beyond,
title={Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts},
author={Rooein, Donya and R{\"o}ttger, Paul and Shaitarova, Anastassia and Hovy, Dirk},
booktitle={Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA)},
year={2024}
}
```
提供机构:
nlpscu



