kewu93/ClashEval

Name: kewu93/ClashEval
Creator: kewu93
Published: 2024-06-10 21:46:48
License: 暂无描述

Hugging Face2024-06-10 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/kewu93/ClashEval

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 size_categories: - 1K<n<10K task_categories: - question-answering pretty_name: Clash Eval v1.0 tags: - medical - webdataset dataset_info: features: - name: question dtype: string - name: context_original dtype: string - name: context_mod dtype: string - name: answer_original dtype: string - name: answer_mod dtype: string - name: mod_degree dtype: string - name: dataset dtype: string splits: - name: test num_bytes: 147170428 num_examples: 10179 download_size: 16247486 dataset_size: 147170428 configs: - config_name: default data_files: - split: test path: data/test-* --- <h2>ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence</h2> Please visit the [GitHub repo](https://github.com/kevinwu23/StanfordClashEval) for all the information about the project. <img src="https://huggingface.co/datasets/kewu93/ClashEval/resolve/main/figure1.png" width="700" height="auto"> <h3>🤗Hugging Face🤗</h3> <li>ClashEval Dataset</li> ```python from datasets import load_dataset dataset = load_dataset('kewu93/ClashEval', trust_remote_code=True) ``` ## Dataset Description - **Paper:** [arXiv](https://arxiv.org/pdf/2404.10198) - **Homepage:** http://github.io/kevinwu23/StanfordClashEval - **Point of Contact:** kevinywu@stanford.edu ### Dataset Summary ClashEval is a framework for understanding the tradeoffs that LLMs make when deciding between their prior responses and the contextual information provided. This Data Card presents information on the ClashEval dataset, which consists of QA pairs accompanied by relevant contextual information. Each question is perturbed along varying degrees. Additionally, the dataset contains questions from six domains: - Drug dosages - Olympic records - Recent news - Names - Locations - Dates ### Supported Tasks Question-Answering, context-driven generation ### Languages English ## Dataset Structure ### Data Fields -`question`: A question that tests knowledge according to one of the six domains provided. -`context_original`: The original unmodified contextual information that can be used to answer the question. -`context_mod`: The modified version of the context where the original answer is substituted with the modified answer. -`answer_original`: The original unmodified answer to the question. -`answer_mod`: The modified answer to the question. -`mod_degree`: The degree to which the original answer has been modified. For datasets drugs, news, records, and years, this value is a continuous value corresponding to the numerical change. For names and locations, the values 1, 2, and 3 refer to increasing levels of perturbation according to prompts given in our paper. -`dataset`: One of the six domains the question and context are drawn from. ### Licensing Information CC BY 4.0 ### Citation Information ``` @article{wu2024faithful, title={How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior}, author={Wu, Kevin and Wu, Eric and Zou, James}, journal={arXiv preprint arXiv:2404.10198}, year={2024} } ```

提供机构：

kewu93

原始信息汇总

ClashEval v1.0 数据集概述

数据集简介

ClashEval 是一个用于理解大型语言模型（LLM）在决定其先验响应和提供上下文信息之间权衡的框架。该数据集包含 QA 对及其相关的上下文信息，每个问题在不同程度上被扰动。

数据集组成

数据字段

question: 测试六个领域知识的提问。
context_original: 用于回答问题的原始未修改上下文信息。
context_mod: 上下文的修改版本，其中原始答案被替换为修改后的答案。
answer_original: 问题的原始未修改答案。
answer_mod: 问题的修改答案。
mod_degree: 原始答案被修改的程度。对于药物、新闻、记录和年份数据集，该值是连续值，对应于数值变化。对于名称和位置，值 1、2 和 3 表示根据论文中给出的提示逐渐增加的扰动级别。
dataset: 问题和上下文所属的六个领域之一。

数据分割

test: 包含 10179 个样本，总大小为 147170428 字节。

支持的任务

问答
上下文驱动的生成

语言

英语

许可证

CC BY 4.0

引用信息

@article{wu2024faithful, title={How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs internal prior}, author={Wu, Kevin and Wu, Eric and Zou, James}, journal={arXiv preprint arXiv:2404.10198}, year={2024} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集