five

quantaRoche/nasa-smd-qa-benchmark-cleaned

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/quantaRoche/nasa-smd-qa-benchmark-cleaned
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc task_categories: - question-answering language: - en tags: - extractive-qa - squad_v2 - cleaned - span-aligned - earth-science pretty_name: NASA-QA (Cleaned SQuAD v2 Format) --- # NASA-QA (Cleaned, SQuAD v2 Format) This dataset is a cleaned and reformatted derivative of: https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark The original dataset is an extractive question answering benchmark in the Earth science domain. This version modifies the data to ensure compatibility with SQuAD v2-style training and evaluation. ## Changes from Original Compared to the original release, this version: - converts the nested structure into one example per QA pair - reformats the dataset into SQuAD v2-style schema - removes `plausible_answers` from unanswerable examples - adds explicit `answer_end` indices - normalizes answer text (e.g., casing, punctuation, whitespace) - realigns answer spans to match exact substrings in the context - fixes incorrect or inconsistent answer offsets - All answerable examples are aligned such that: > the answer text appears exactly in the context at the specified indices ## Dataset Structure Each example contains: - `id`: unique identifier - `question`: question text - `context`: paragraph context - `answers`: - `text`: list of answer strings - `answer_start`: start indices - `answer_end`: end indices - `is_impossible`: whether the question is unanswerable Splits: - `train` - `validation` ## Notes - This dataset is intended for extractive QA tasks using SQuAD v2-style evaluation. - Unanswerable questions are preserved using empty answer lists. - `plausible_answers` from the original dataset were removed, as they are not required for standard SQuAD v2 training or evaluation. ## Attribution This dataset is derived from the original NASA-QA benchmark: - NASA SMD & IBM Research - Paper: https://arxiv.org/abs/2405.10725 Please cite the original work when using this dataset. ## License This dataset inherits the license from the original dataset.
提供机构:
quantaRoche
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作