kewu93/ClashEval
收藏Hugging Face2024-06-10 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/kewu93/ClashEval
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
size_categories:
- 1K<n<10K
task_categories:
- question-answering
pretty_name: Clash Eval v1.0
tags:
- medical
- webdataset
dataset_info:
features:
- name: question
dtype: string
- name: context_original
dtype: string
- name: context_mod
dtype: string
- name: answer_original
dtype: string
- name: answer_mod
dtype: string
- name: mod_degree
dtype: string
- name: dataset
dtype: string
splits:
- name: test
num_bytes: 147170428
num_examples: 10179
download_size: 16247486
dataset_size: 147170428
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
---
<h2>ClashEval: Quantifying the tug-of-war between an LLM’s internal prior and external evidence</h2>
Please visit the [GitHub repo](https://github.com/kevinwu23/StanfordClashEval) for all the information about the project.
<img src="https://huggingface.co/datasets/kewu93/ClashEval/resolve/main/figure1.png" width="700" height="auto">
<h3>🤗Hugging Face🤗</h3>
<li>ClashEval Dataset</li>
```python
from datasets import load_dataset
dataset = load_dataset('kewu93/ClashEval', trust_remote_code=True)
```
## Dataset Description
- **Paper:** [arXiv](https://arxiv.org/pdf/2404.10198)
- **Homepage:** http://github.io/kevinwu23/StanfordClashEval
- **Point of Contact:** kevinywu@stanford.edu
### Dataset Summary
ClashEval is a framework for understanding the tradeoffs that LLMs make when deciding between their prior responses and the contextual information provided.
This Data Card presents information on the ClashEval dataset, which consists of QA pairs accompanied by relevant contextual information.
Each question is perturbed along varying degrees.
Additionally, the dataset contains questions from six domains:
- Drug dosages
- Olympic records
- Recent news
- Names
- Locations
- Dates
### Supported Tasks
Question-Answering, context-driven generation
### Languages
English
## Dataset Structure
### Data Fields
-`question`: A question that tests knowledge according to one of the six domains provided.
-`context_original`: The original unmodified contextual information that can be used to answer the question.
-`context_mod`: The modified version of the context where the original answer is substituted with the modified answer.
-`answer_original`: The original unmodified answer to the question.
-`answer_mod`: The modified answer to the question.
-`mod_degree`: The degree to which the original answer has been modified. For datasets drugs, news, records, and years, this value is a continuous value corresponding to the numerical change. For names and locations, the values 1, 2, and 3 refer to increasing levels of perturbation according to prompts given in our paper.
-`dataset`: One of the six domains the question and context are drawn from.
### Licensing Information
CC BY 4.0
### Citation Information
```
@article{wu2024faithful,
title={How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior},
author={Wu, Kevin and Wu, Eric and Zou, James},
journal={arXiv preprint arXiv:2404.10198},
year={2024}
}
```
提供机构:
kewu93
原始信息汇总
ClashEval v1.0 数据集概述
数据集简介
ClashEval 是一个用于理解大型语言模型(LLM)在决定其先验响应和提供上下文信息之间权衡的框架。该数据集包含 QA 对及其相关的上下文信息,每个问题在不同程度上被扰动。
数据集组成
数据字段
question: 测试六个领域知识的提问。context_original: 用于回答问题的原始未修改上下文信息。context_mod: 上下文的修改版本,其中原始答案被替换为修改后的答案。answer_original: 问题的原始未修改答案。answer_mod: 问题的修改答案。mod_degree: 原始答案被修改的程度。对于药物、新闻、记录和年份数据集,该值是连续值,对应于数值变化。对于名称和位置,值 1、2 和 3 表示根据论文中给出的提示逐渐增加的扰动级别。dataset: 问题和上下文所属的六个领域之一。
数据分割
test: 包含 10179 个样本,总大小为 147170428 字节。
支持的任务
- 问答
- 上下文驱动的生成
语言
英语
许可证
CC BY 4.0
引用信息
@article{wu2024faithful, title={How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs internal prior}, author={Wu, Kevin and Wu, Eric and Zou, James}, journal={arXiv preprint arXiv:2404.10198}, year={2024} }



