DUNE
收藏arXiv2023-11-28 更新2024-06-21 收录
下载链接:
https://github.com/feyzaakyurek/dune
下载链接
链接失效反馈官方服务:
资源简介:
DUNE是一个精心策划的数据集,用于统一编辑任务的基准测试。该数据集由波士顿大学的研究团队创建,涵盖了科学推理、算术推理、引入新信息和去偏见四个领域。每个编辑案例都以自由形式的文本表达,旨在促使模型输出发生必要的改变。DUNE不仅用于评估模型的编辑技术,还旨在解决模型输出的多样性问题,如纠正推理错误、算术错误、引入新信息和减少偏见。数据集的构建结合了自动化和人工验证,确保了数据质量,适用于评估和改进语言模型的编辑能力。
DUNE is a curated dataset for benchmarking unified editing tasks. Developed by a research team at Boston University, this dataset covers four domains: scientific reasoning, arithmetic reasoning, introducing new information, and debiasing. Each editing case is expressed in free-form text, designed to prompt necessary modifications to model outputs. DUNE not only serves to evaluate model editing techniques but also aims to address issues related to the diversity of model outputs, such as correcting reasoning errors, arithmetic errors, introducing new information, and reducing bias. The dataset is built through a combination of automated processes and human validation to ensure data quality, making it suitable for evaluating and enhancing the editing capabilities of language models.
提供机构:
波士顿大学
创建时间:
2023-11-28



