eve-esa/mcqa-multiple-answers

Name: eve-esa/mcqa-multiple-answers
Creator: eve-esa
Published: 2026-04-16 07:58:18
License: 暂无描述

Hugging Face2026-04-16 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/eve-esa/mcqa-multiple-answers

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: Question dtype: string - name: Answers list: string - name: Choices struct: - name: label list: string - name: text list: string splits: - name: train num_bytes: 113892 num_examples: 431 download_size: 58809 dataset_size: 113892 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-4.0 task_categories: - multiple-choice language: - en tags: - EVE - EO - Earth pretty_name: EVE-mcqa-multiple-answers size_categories: - n<1K --- # Dataset Summary EVE-mcqa-multiple-answers is a Multiple-Choice Question Answering (MCQA) dataset designed to evaluate the performance of language models in the domain of Earth Observation (EO). The dataset consists of questions related to EO concepts, technologies, and applications, each accompanied by multiple answer choices, with one or more correct answer. # Dataset Structure Each example in the dataset contains an arbitrary number of possible choices and one or more correct answers. The structure of each example is as follows: - **Question**: String representing the question related to Earth Observation. - **Answers**: The list of the correct answer(s) indicated. - **Choices**: A list of possible dictionary containing: - **text**: The text of the answer choice. - **label**: The label of the answer choice (e.g., "A", "B", "C", etc.). # Metrics The metrics used to evaluate model performance on the EVE-mcqa dataset are: - **Exact Match (EM)**: Measures the percentage of perfect overlap between the predicted and the reference answer set. - **Intersection over Union (IoU)**: Measures the overlap between the predicted and reference answer sets. ## Computing the Metrics Here's how to compute both metrics in Python: ```python def exact_match(predicted_answers, reference_answers): """ Compute Exact Match score. Args: predicted_answers: Set or list of predicted answer labels reference_answers: Set or list of reference answer labels Returns: 1.0 if sets match exactly, 0.0 otherwise """ pred_set = set(predicted_answers) if not isinstance(predicted_answers, set) else predicted_answers ref_set = set(reference_answers) if not isinstance(reference_answers, set) else reference_answers return 1.0 if pred_set == ref_set else 0.0 def intersection_over_union(predicted_answers, reference_answers): """ Compute Intersection over Union (IoU) score. Args: predicted_answers: Set or list of predicted answer labels reference_answers: Set or list of reference answer labels Returns: IoU score between 0.0 and 1.0 """ pred_set = set(predicted_answers) if not isinstance(predicted_answers, set) else predicted_answers ref_set = set(reference_answers) if not isinstance(reference_answers, set) else reference_answers if len(pred_set) == 0 and len(ref_set) == 0: return 1.0 intersection = len(pred_set & ref_set) union = len(pred_set | ref_set) return intersection / union if union > 0 else 0.0 # Example usage with the dataset from datasets import load_dataset # Load the dataset dataset = load_dataset("eve-esa/mcqa-multiple-answers", split="train") # Example: Get predictions for a single example example = dataset[0] reference_answers = example["Answer"] # Ground truth answer(s) # Simulate model predictions (replace with your model's actual predictions) predicted_answers = ["A", "C"] # Your model's predicted answer labels # Compute metrics em_score = exact_match(predicted_answers, reference_answers) iou_score = intersection_over_union(predicted_answers, reference_answers) print(f"Question: {example['Question']}") print(f"Reference answers: {reference_answers}") print(f"Predicted answers: {predicted_answers}") print(f"Exact Match: {em_score}") print(f"IoU: {iou_score:.4f}") # Compute average metrics across the entire dataset total_em = 0 total_iou = 0 for example in dataset: # Replace this with your model's actual predictions predicted = your_model_predict(example["Question"], example["Choices"]) reference = example["Answer"] total_em += exact_match(predicted, reference) total_iou += intersection_over_union(predicted, reference) avg_em = total_em / len(dataset) avg_iou = total_iou / len(dataset) print(f"\nAverage Exact Match: {avg_em:.4f}") print(f"Average IoU: {avg_iou:.4f}") ``` # Citation If you use this project in academic or research settings, please cite: ``` @misc{atrio2026evedomainspecificllmframework, title={{EVE}: A Domain-Specific {LLM} Framework for Earth Intelligence}, author={Àlex R. Atrio and Antonio Lopez and Jino Rohit and Yassine El Ouahidi and Marcello Politi and Vijayasri Iyer and Umar Jamil and Sébastien Bratières and Nicolas Longépé}, year={2026}, eprint={2604.13071}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2604.13071}, } ```

提供机构：

eve-esa

5,000+

优质数据集

54 个

任务类型

进入经典数据集