five

princeton-nlp/QuRating-GPT3.5-Judgments-Test

收藏
Hugging Face2024-03-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/princeton-nlp/QuRating-GPT3.5-Judgments-Test
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: QuRating-GPT3.5-Judgments-Test --- *7140 pairwise judgments across 4 criteria and 6 domains obtained by prompting GPT-3.5-turbo-0613 for evaluating QuRater models.* From the paper: [QuRating: Selecting High-Quality Data for Training Language Models](https://arxiv.org/abs/2402.09739) **_Guidance on Responsible Use_** In the paper, we document various types of bias that are present in the quality ratings/QuRater model (biases related to domains, topics, social roles, regions and languages - see Section 6 of the paper), which are likely reflected in the LLM judgments. Hence, be aware that data selection with QuRating could have unintended and harmful effects on the language model that is being trained. We strongly recommend a comprehensive evaluation of the language model for these and other types of bias, particularly before real-world deployment. We hope that releasing the data/models can facilitate future research aimed at uncovering and mitigating such biases. #### Dataset columns * `texts`: A list of two text snippets * For each criteria (`writing_style`, `facts_and_trivia`, `educational_value`, `required_expertise`) we have four fields: * `{criteria}_votes_b`: Vote matrix where the value at indices *(a,b)* denote the number of votes for the text at index *b* * `{criteria}_votes_a`: Vote matrix where the value at indices *(a,b)* denote the number of votes for the text at index *a* * `{criteria}_average`: Averaged votes matrix where the value at indices *(a,b)* corresponds to *p(`text_b` > `text_a`)*. We normalize the matrix such that the sum with its transpose is equal to 1.0. Value of -100 are along the diagonal and where we didn't receive enough votes due to Azure content filters. * For practical purposes: ``` criteria = "educational_value" # for example text_a, text_b = dataset[index]["texts"] probability_b_over_a = dataset[index][f"{criteria}_average"][0][1] ``` * `source_domains`: A list of the original RedPajama sets of the text snippets <!-- --- dataset_info: features: - name: texts sequence: string - name: educational_value_votes_a sequence: sequence: int64 - name: educational_value_votes_b sequence: sequence: int64 - name: educational_value_average sequence: sequence: float64 - name: facts_and_trivia_votes_a sequence: sequence: int64 - name: facts_and_trivia_votes_b sequence: sequence: int64 - name: facts_and_trivia_average sequence: sequence: float64 - name: required_expertise_votes_a sequence: sequence: int64 - name: required_expertise_votes_b sequence: sequence: int64 - name: required_expertise_average sequence: sequence: float64 - name: writing_style_votes_a sequence: sequence: int64 - name: writing_style_votes_b sequence: sequence: int64 - name: writing_style_average sequence: sequence: float64 - name: source_domains sequence: string splits: - name: ArXiv num_bytes: 4703676 num_examples: 1428 - name: Book num_bytes: 5499142 num_examples: 1428 - name: C4 num_bytes: 5990895 num_examples: 1428 - name: Github num_bytes: 4865439 num_examples: 1428 - name: Wikipedia_en num_bytes: 5407511 num_examples: 1428 - name: StackExchange num_bytes: 4999697 num_examples: 1428 download_size: 16357595 dataset_size: 31466360 configs: - config_name: default data_files: - split: ArXiv path: data/ArXiv-* - split: Book path: data/Book-* - split: C4 path: data/C4-* - split: Github path: data/Github-* - split: Wikipedia_en path: data/Wikipedia_en-* - split: StackExchange path: data/StackExchange-* --- -->
提供机构:
princeton-nlp
原始信息汇总

数据集概述

  • 名称: QuRating-GPT3.5-Judgments-Test
  • 描述: 包含7140对判断,涉及4个标准和6个领域,通过提示GPT-3.5-turbo-0613模型对QuRater模型进行评估。

数据集列信息

  • 文本列: texts,包含两个文本片段的列表。
  • 评价标准:
    • writing_style, facts_and_trivia, educational_value, required_expertise
    • 每个标准包含以下字段:
      • {criteria}_votes_b: 投票矩阵,索引 (a,b) 表示对索引 b 的文本的投票数。
      • {criteria}_votes_a: 投票矩阵,索引 (a,b) 表示对索引 a 的文本的投票数。
      • {criteria}_average: 平均投票矩阵,索引 (a,b) 对应 p(text_b > text_a)。矩阵经过归一化处理,使其与其转置的和等于1.0。对角线和因Azure内容过滤器未收到足够投票的位置值为-100。
  • 源领域: source_domains,包含文本片段的原始RedPajama集合列表。

数据集结构

  • 特征:
    • texts: 字符串序列
    • educational_value_votes_a, facts_and_trivia_votes_a, required_expertise_votes_a, writing_style_votes_a: 整数序列的序列
    • educational_value_votes_b, facts_and_trivia_votes_b, required_expertise_votes_b, writing_style_votes_b: 整数序列的序列
    • educational_value_average, facts_and_trivia_average, required_expertise_average, writing_style_average: 浮点数序列的序列
    • source_domains: 字符串序列
  • 分割:
    • ArXiv, Book, C4, Github, Wikipedia_en, StackExchange
    • 每个分割包含字节数和示例数。

数据集大小

  • 下载大小: 16357595字节
  • 数据集大小: 31466360字节

配置

  • 默认配置:
    • 数据文件路径根据分割命名。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作