ibm/Wikipedia_contradict_benchmark
收藏Hugging Face2024-07-11 更新2024-07-13 收录
下载链接:
https://hf-mirror.com/datasets/ibm/Wikipedia_contradict_benchmark
下载链接
链接失效反馈官方服务:
资源简介:
Wikipedia contradict benchmark是一个包含253个高质量人工注释实例的数据集,旨在评估大型语言模型(LLMs)在处理包含现实世界知识冲突的检索段落时的表现。每个实例包含一个问题、一对从维基百科提取的矛盾段落以及两个不同的答案。数据集的创建是为了解决LLMs在处理知识冲突时的局限性,特别是当这些段落来自同一来源且具有相同的可信度时。数据集由IBM Research的Yufang Hou等人策划和共享,语言为英语,采用MIT许可证。
Wikipedia contradict benchmark is a dataset consisting of 253 high-quality, human-annotated instances designed to assess LLM performance when augmented with retrieved passages containing real-world knowledge conflicts. Each instance consists of a question, a pair of contradictory passages extracted from Wikipedia, and two distinct answers, each derived from the passages. The dataset is curated by researchers from IBM Research and is intended for use in evaluating LLMs ability to handle knowledge conflicts. The dataset is provided in CSV format and includes detailed annotations regarding the type and source of contradictions, as well as the validity of the tags used in the Wikipedia articles. The README also mentions the datasets usage in a published paper and provides instructions for loading and using the dataset in research.
提供机构:
ibm



