KQA-Pro+QA2HALL dataset
收藏Zenodo2025-04-28 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.15278335
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is designed for the task of detecting hallucinations in text generated by large language models. It was created using the framework QA2HALL proposed in [1], which generates sentences containing synthetical hallucinations by using questions from a Knowledge Graph Question Answering (KGQA) dataset and incorrect answers generated by LLMs. The KGQA dataset used as the basis is KQA Pro (https://huggingface.co/datasets/drt/kqa_pro).
For more details about the dataset and the framework, please refer to the paper and the framework's GitHub repository.
File lists
kqapro_qa2hall.json
halucination detection dataset which was generated by applying QA2HALL on KQA Pro
kqapro_filterd.json
filterd version of KQA Pro (train and validation sets) for experiments in [1]
We used this filtered version to construct "kqapro_qa2hall.json"
README.md
see this file for the format of the JSON files
[1] Nakamura, K., Hasegawa, R., Otomura, K., Ichise, R., Hato, J.: QA2HALL: A Framework for Generating Non-trivial Hallucination Detection Datasets from KGQA Datasets. In: Proceedings of the 30th International Conference on Natural Language & Informations Systems (2025) (In preparation)
提供机构:
Zenodo
创建时间:
2025-04-28



