five

0-hero/prompt-perfect

收藏
Hugging Face2024-03-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/0-hero/prompt-perfect
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了使用GPT-3.5和GPT-4模型对多个流行数据集进行评分的结果。这些数据集包括airoboros-2.1、alpaca-gpt4、dolphin等,评分基于“Self-Alignment with Instruction Backtranslation”论文中的提示。每个数据集都有两个额外的列:score和extracted_score,分别表示模型的响应和提取的分数。评分模型包括gpt-3.5-turbo-16k、gpt-3.5-turbo-1106和gpt-3.5-turbo-0125。评分标准分为5个等级,从1(不完全、模糊、离题、有争议或不符合用户要求)到5(完美的AI助手回答,清晰、逻辑性强、易于理解、引人入胜且富有洞察力)。

This dataset contains the results of scoring multiple popular datasets using GPT-3.5 and GPT-4 models. The datasets include airoboros-2.1, alpaca-gpt4, dolphin, among others. The scoring is based on the prompts from the paper *Self-Alignment with Instruction Backtranslation*. Each dataset features two additional columns: `score` and `extracted_score`, which correspond to the model's generated response and the extracted score, respectively. The scoring models used include gpt-3.5-turbo-16k, gpt-3.5-turbo-1106, and gpt-3.5-turbo-0125. The scoring criteria are categorized into 5 levels, ranging from 1 (incomplete, ambiguous, off-topic, controversial, or failing to meet user requirements) to 5 (a perfect AI assistant response that is clear, logical, easy to comprehend, engaging, and insightful).
提供机构:
0-hero
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 大小: 1M<n<10M
  • 标签: 合成, 蒸馏, GPT-4, GPT-3.5

数据集描述

该数据集包含35个经过评分的大型数据集(超过60亿个令牌),使用GPT-3.5系列模型进行评分。每个数据集包含两个额外列:

  • score: 模型响应,包括CoT(如果提供)
  • extracted_score: 从score列中提取的评分,为整数

评分模型

  • gpt-3.5-turbo-16k
  • gpt-3.5-turbo-1106
  • gpt-3.5-turbo-0125

评分数据集

原始评分提示(来自论文)

  • airoboros-2.1
  • alpaca-gpt4
  • dolphin
  • open-platypus
  • orca_mini_v1
  • SlimOrca-Dedup
  • Synthia-1.3
  • wizard_alpaca_dolly_orca

对话评分提示(修改)

  • Capybara
  • ultrachat

评分分布

数据集 5分 4分 3分 2分 1分 0分
dolphin 80.232373 10.841314 2.217159 3.075088 3.63371 0.000356
open-platypus 76.390115 10.779909 3.093156 3.558533 6.178288 0
Capybara 73.57241 12.851431 3.005123 4.117206 6.435087 0.018743
airoboros-2.1 69.869994 26.695312 1.322096 1.076957 1.035641 0
alpaca-gpt4 65.421891 31.797554 1.301823 0.824937 0.653796 0
wizard_alpaca_dolly_orca 63.898674 32.68317 1.752752 0.894614 0.769829 0.00096
ultrachat 50.213948 40.684169 5.741387 2.880979 0.478934 0.000582
orca_mini_v1 46.351518 49.313846 1.568606 1.898745 0.867284 0
Synthia-v1.3 39.262214 52.335033 2.627859 3.38096 2.392252 0.001683
SlimOrca-Dedup 29.987262 55.132314 7.122872 2.998424 4.759127 0

评分提示

原始评分提示(来自论文)

Below is an instruction from an user and a candidate answer. Evaluate whether or not the answer is a good example of how AI Assistant should respond to the user’s instruction. Please assign a score using the following 5-point scale: 1: It means the answer is incomplete, vague, off-topic, controversial, or not exactly what the user asked for. For example, some content seems missing, numbered list does not start from the beginning, the opening sentence repeats user’s question. Or the response is from another person’s perspective with their personal experience (e.g. taken from blog posts), or looks like an answer from a forum. Or it contains promotional text, navigation text, or other irrelevant information. 2: It means the answer addresses most of the asks from the user. It does not directly address the user’s question. For example, it only provides a high-level methodology instead of the exact solution to user’s question. 3: It means the answer is helpful but not written by an AI Assistant. It addresses all the basic asks from the user. It is complete and self contained with the drawback that the response is not written from an AI assistant’s perspective, but from other people’s perspective. The content looks like an excerpt from a blog post, web page, or web search results. For example, it contains personal experience or opinion, mentions comments section, or share on social media, etc. 4: It means the answer is written from an AI assistant’s perspective with a clear focus of addressing the instruction. It provide a complete, clear, and comprehensive response to user’s question or instruction without missing or irrelevant information. It is well organized, self-contained, and written in a helpful tone. It has minor room for improvement, e.g. more concise and focused. 5: It means it is a perfect answer from an AI Assistant. It has a clear focus on being a helpful AI Assistant, where the response looks like intentionally written to address the user’s question or instruction without any irrelevant sentences. The answer provides high quality content, demonstrating expert knowledge in the area, is very well written, logical, easy-to-follow, engaging and insightful. Please first provide a chain of thought brief reasoning you used to derive the rating score, and then write "Score: <rating>" in the last line.

对话评分提示(修改)

Below are a series of user instructions and corresponding candidate answers in a multi-turn conversation. Evaluate whether or not each answer is a good example of how the AI Assistant should respond to the user’s instructions in the context of an ongoing dialogue. Please assign a score using the following 5-point scale: 1: The answer is incomplete, vague, off-topic, controversial, or fails to build upon previous turns in the conversation. It might ignore context provided earlier, repeat information unnecessarily, or deviate from the conversational flow. Examples include missing content that should logically follow from earlier turns, responses that reset the conversation without acknowledging past interactions, or introducing irrelevant or promotional information. 2: The answer addresses the users concerns but misses key elements of context or nuance from previous turns. It might provide a generally correct direction but fails to leverage the multi-turn nature of the conversation, such as not recalling information provided earlier or not sufficiently building upon it. 3: The answer is helpful and acknowledges the multi-turn context but reads more like a series of standalone responses rather than a cohesive conversation. It covers the basic asks from the user across multiple turns but might lack a seamless integration of conversation history or a sense of ongoing dialogue. 4: The answer is well-tailored to a multi-turn conversation, showing awareness of previous interactions and building upon them effectively. It is clear, comprehensive, and maintains a conversational flow, with only minor room for improvement, such as refining the integration of past and current turns or enhancing conversational fluidity. 5: The answer exemplifies perfect handling of a multi-turn conversation by an AI Assistant. It seamlessly integrates information from previous turns, providing high-quality, context-aware responses that demonstrate expert knowledge and maintain a logical, engaging, and insightful dialogue flow throughout. Please first provide a brief chain of thought reasoning you used to derive the rating score, considering how well the AI Assistant maintains and builds upon the conversational context. Then write "Score: <rating>" in the last line.

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作