princeton-nlp/QuRating-GPT3.5-Judgments-Test
收藏Hugging Face2024-03-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/princeton-nlp/QuRating-GPT3.5-Judgments-Test
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: QuRating-GPT3.5-Judgments-Test
---
*7140 pairwise judgments across 4 criteria and 6 domains obtained by prompting GPT-3.5-turbo-0613 for evaluating QuRater models.*
From the paper: [QuRating: Selecting High-Quality Data for Training Language Models](https://arxiv.org/abs/2402.09739)
**_Guidance on Responsible Use_**
In the paper, we document various types of bias that are present in the quality ratings/QuRater model (biases related to domains, topics, social roles, regions and languages - see Section 6 of the paper),
which are likely reflected in the LLM judgments.
Hence, be aware that data selection with QuRating could have unintended and harmful effects on the language model that is being trained.
We strongly recommend a comprehensive evaluation of the language model for these and other types of bias, particularly before real-world deployment.
We hope that releasing the data/models can facilitate future research aimed at uncovering and mitigating such biases.
#### Dataset columns
* `texts`: A list of two text snippets
* For each criteria (`writing_style`, `facts_and_trivia`, `educational_value`, `required_expertise`) we have four fields:
* `{criteria}_votes_b`: Vote matrix where the value at indices *(a,b)* denote the number of votes for the text at index *b*
* `{criteria}_votes_a`: Vote matrix where the value at indices *(a,b)* denote the number of votes for the text at index *a*
* `{criteria}_average`: Averaged votes matrix where the value at indices *(a,b)* corresponds to *p(`text_b` > `text_a`)*. We normalize the matrix such that the sum with its transpose is equal to 1.0. Value of -100 are along the diagonal and where we didn't receive enough votes due to Azure content filters.
* For practical purposes:
```
criteria = "educational_value" # for example
text_a, text_b = dataset[index]["texts"]
probability_b_over_a = dataset[index][f"{criteria}_average"][0][1]
```
* `source_domains`: A list of the original RedPajama sets of the text snippets
<!--
---
dataset_info:
features:
- name: texts
sequence: string
- name: educational_value_votes_a
sequence:
sequence: int64
- name: educational_value_votes_b
sequence:
sequence: int64
- name: educational_value_average
sequence:
sequence: float64
- name: facts_and_trivia_votes_a
sequence:
sequence: int64
- name: facts_and_trivia_votes_b
sequence:
sequence: int64
- name: facts_and_trivia_average
sequence:
sequence: float64
- name: required_expertise_votes_a
sequence:
sequence: int64
- name: required_expertise_votes_b
sequence:
sequence: int64
- name: required_expertise_average
sequence:
sequence: float64
- name: writing_style_votes_a
sequence:
sequence: int64
- name: writing_style_votes_b
sequence:
sequence: int64
- name: writing_style_average
sequence:
sequence: float64
- name: source_domains
sequence: string
splits:
- name: ArXiv
num_bytes: 4703676
num_examples: 1428
- name: Book
num_bytes: 5499142
num_examples: 1428
- name: C4
num_bytes: 5990895
num_examples: 1428
- name: Github
num_bytes: 4865439
num_examples: 1428
- name: Wikipedia_en
num_bytes: 5407511
num_examples: 1428
- name: StackExchange
num_bytes: 4999697
num_examples: 1428
download_size: 16357595
dataset_size: 31466360
configs:
- config_name: default
data_files:
- split: ArXiv
path: data/ArXiv-*
- split: Book
path: data/Book-*
- split: C4
path: data/C4-*
- split: Github
path: data/Github-*
- split: Wikipedia_en
path: data/Wikipedia_en-*
- split: StackExchange
path: data/StackExchange-*
---
-->
提供机构:
princeton-nlp
原始信息汇总
数据集概述
- 名称: QuRating-GPT3.5-Judgments-Test
- 描述: 包含7140对判断,涉及4个标准和6个领域,通过提示GPT-3.5-turbo-0613模型对QuRater模型进行评估。
数据集列信息
- 文本列:
texts,包含两个文本片段的列表。 - 评价标准:
writing_style,facts_and_trivia,educational_value,required_expertise- 每个标准包含以下字段:
{criteria}_votes_b: 投票矩阵,索引 (a,b) 表示对索引 b 的文本的投票数。{criteria}_votes_a: 投票矩阵,索引 (a,b) 表示对索引 a 的文本的投票数。{criteria}_average: 平均投票矩阵,索引 (a,b) 对应 p(text_b>text_a)。矩阵经过归一化处理,使其与其转置的和等于1.0。对角线和因Azure内容过滤器未收到足够投票的位置值为-100。
- 源领域:
source_domains,包含文本片段的原始RedPajama集合列表。
数据集结构
- 特征:
texts: 字符串序列educational_value_votes_a,facts_and_trivia_votes_a,required_expertise_votes_a,writing_style_votes_a: 整数序列的序列educational_value_votes_b,facts_and_trivia_votes_b,required_expertise_votes_b,writing_style_votes_b: 整数序列的序列educational_value_average,facts_and_trivia_average,required_expertise_average,writing_style_average: 浮点数序列的序列source_domains: 字符串序列
- 分割:
ArXiv,Book,C4,Github,Wikipedia_en,StackExchange- 每个分割包含字节数和示例数。
数据集大小
- 下载大小: 16357595字节
- 数据集大小: 31466360字节
配置
- 默认配置:
- 数据文件路径根据分割命名。



