five

nlpscu/Beyond-Flesch

收藏
Hugging Face2026-04-14 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/nlpscu/Beyond-Flesch
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-classification language: - en tags: - education - readability - difficulty-classification - scienceqa pretty_name: 'Beyond-Flesch: ScienceQA Difficulty Classification' size_categories: - 1K<n<10K configs: - config_name: texts data_files: - split: train path: train.csv - split: test path: test.csv - config_name: static_metrics data_files: - split: train path: train_static_metrics.csv - split: test path: test_static_metrics.csv - config_name: prompt_metrics_gemma_7b data_files: - split: train path: train_prompt_metrics_gemma-7b.csv - split: test path: test_prompt_metrics_gemma-7b.csv --- # Beyond-Flesch: ScienceQA Difficulty Classification with Static and Prompt-Based Metrics A preprocessed subset of [ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) for K-12 educational text difficulty classification, along with the static and LLM-derived prompt-based features we use to reproduce Rooein et al. (2024) *Beyond Flesch-Kincaid*. This dataset accompanies our class research project (Option 1: reproducing a paper whose original code was not released). ## What's here | File | Rows | Description | |------|------|-------------| | `train.csv` | 3,638 | Training split, balanced across the 3 grade buckets | | `test.csv` | 910 | Test split, balanced across the 3 grade buckets | | `train_static_metrics.csv` | 3,638 | 46 static readability features per Appendix C of Rooein et al. (2024) | | `test_static_metrics.csv` | 910 | Same, on test split | | `train_prompt_metrics_gemma-7b.csv` | 3,638 | 63 prompt-based features computed with Gemma-7B-IT | | `test_prompt_metrics_gemma-7b.csv` | 910 | Same, on test split | Mistral-7B prompt metrics are at ~50% completion and will be added in the next release. Llama-2-7B and Llama-2-13B are queued for the next submission. ## How it was built Following Section 4.1 of Rooein et al. (2024): 1. Loaded the full ScienceQA dataset (21,208 items) from `derek-thomas/ScienceQA` 2. Filtered out items with images 3. Collapsed the 12 K-12 grade levels into 3 buckets: elementary (1–5), middle (6–8), high (9–12) 4. Deduplicated on the combined `full_text` field 5. Sampled 1,516 items per bucket with random seed 42 → 4,548 balanced items 6. 80/20 stratified train/test split → 3,638 / 910 Static metrics computed with `textstat`, `nltk`, `spacy`, and WordNet (see Appendix C of the Rooein paper for the full list). Prompt-based metrics computed by querying Gemma-7B-IT (8-bit quantized via `bitsandbytes`) with each of the 63 prompts from Appendix A. ## Columns in `train.csv` / `test.csv` `question`, `choices`, `solution`, `lecture`, `full_text`, `text_question`, `text_solution`, `text_lecture`, `education_level`, `grade`, `subject`, `topic`, `category` The classification target is `education_level` (`elementary` / `middle` / `high`). ## Code Full reproduction code: https://github.com/SCU-CSEN346/Beyond-Flesch ## Citation If you use this dataset, please cite both the original ScienceQA paper and the Rooein paper we are reproducing: ```bibtex @inproceedings{lu2022learn, title={Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering}, author={Lu, Pan and Mishra, Swaroop and Xia, Tony and Qiu, Liang and Chang, Kai-Wei and Zhu, Song-Chun and Tafjord, Oyvind and Clark, Peter and Kalyan, Ashwin}, booktitle={NeurIPS}, year={2022} } @inproceedings{rooein2024beyond, title={Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts}, author={Rooein, Donya and R{\"o}ttger, Paul and Shaitarova, Anastassia and Hovy, Dirk}, booktitle={Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA)}, year={2024} } ```
提供机构:
nlpscu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作