princeton-nlp/QuRating-GPT3.5-Judgments-Test

Name: princeton-nlp/QuRating-GPT3.5-Judgments-Test
Creator: princeton-nlp
Published: 2024-03-29 07:07:32
License: 暂无描述

Hugging Face2024-03-29 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/princeton-nlp/QuRating-GPT3.5-Judgments-Test

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: QuRating-GPT3.5-Judgments-Test --- *7140 pairwise judgments across 4 criteria and 6 domains obtained by prompting GPT-3.5-turbo-0613 for evaluating QuRater models.* From the paper: [QuRating: Selecting High-Quality Data for Training Language Models](https://arxiv.org/abs/2402.09739) **_Guidance on Responsible Use_** In the paper, we document various types of bias that are present in the quality ratings/QuRater model (biases related to domains, topics, social roles, regions and languages - see Section 6 of the paper), which are likely reflected in the LLM judgments. Hence, be aware that data selection with QuRating could have unintended and harmful effects on the language model that is being trained. We strongly recommend a comprehensive evaluation of the language model for these and other types of bias, particularly before real-world deployment. We hope that releasing the data/models can facilitate future research aimed at uncovering and mitigating such biases. #### Dataset columns * `texts`: A list of two text snippets * For each criteria (`writing_style`, `facts_and_trivia`, `educational_value`, `required_expertise`) we have four fields: * `{criteria}_votes_b`: Vote matrix where the value at indices *(a,b)* denote the number of votes for the text at index *b* * `{criteria}_votes_a`: Vote matrix where the value at indices *(a,b)* denote the number of votes for the text at index *a* * `{criteria}_average`: Averaged votes matrix where the value at indices *(a,b)* corresponds to *p(`text_b` > `text_a`)*. We normalize the matrix such that the sum with its transpose is equal to 1.0. Value of -100 are along the diagonal and where we didn't receive enough votes due to Azure content filters. * For practical purposes: ``` criteria = "educational_value" # for example text_a, text_b = dataset[index]["texts"] probability_b_over_a = dataset[index][f"{criteria}_average"][0][1] ``` * `source_domains`: A list of the original RedPajama sets of the text snippets

提供机构：

princeton-nlp

原始信息汇总

数据集概述

名称: QuRating-GPT3.5-Judgments-Test
描述: 包含7140对判断，涉及4个标准和6个领域，通过提示GPT-3.5-turbo-0613模型对QuRater模型进行评估。

数据集列信息

文本列: texts，包含两个文本片段的列表。
评价标准:
- writing_style, facts_and_trivia, educational_value, required_expertise
- 每个标准包含以下字段:
  - {criteria}_votes_b: 投票矩阵，索引 (a,b) 表示对索引 b 的文本的投票数。
  - {criteria}_votes_a: 投票矩阵，索引 (a,b) 表示对索引 a 的文本的投票数。
  - {criteria}_average: 平均投票矩阵，索引 (a,b) 对应 p(text_b > text_a)。矩阵经过归一化处理，使其与其转置的和等于1.0。对角线和因Azure内容过滤器未收到足够投票的位置值为-100。
源领域: source_domains，包含文本片段的原始RedPajama集合列表。

数据集结构

特征:
- texts: 字符串序列
- educational_value_votes_a, facts_and_trivia_votes_a, required_expertise_votes_a, writing_style_votes_a: 整数序列的序列
- educational_value_votes_b, facts_and_trivia_votes_b, required_expertise_votes_b, writing_style_votes_b: 整数序列的序列
- educational_value_average, facts_and_trivia_average, required_expertise_average, writing_style_average: 浮点数序列的序列
- source_domains: 字符串序列
分割:
- ArXiv, Book, C4, Github, Wikipedia_en, StackExchange
- 每个分割包含字节数和示例数。

数据集大小

下载大小: 16357595字节
数据集大小: 31466360字节

配置

默认配置:
- 数据文件路径根据分割命名。

5,000+

优质数据集

54 个

任务类型

进入经典数据集