argilla/comparison-dataset-dolly-curated-falcon
收藏Hugging Face2023-06-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/argilla/comparison-dataset-dolly-curated-falcon
下载链接
链接失效反馈官方服务:
资源简介:
---
{}
---
## Guidelines
These guidelines are based on the paper [Training Language Models to Follow Instructions with Human Feedback](https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf)
**You are given a text-based description of a task, submitted by a user.**
This task description may be in the form of an explicit instruction (e.g. "Write a story about a wise frog."). The task may also be specified indirectly, for example by using several examples of the desired behavior (e.g. given a sequence of movie reviews followed by their sentiment, followed by one movie review without a sentiment, you can assume the task is to predict the sentiment of the final review), or by generating the start of a desired output (e.g. given "There once was a wise frog named Julius," you can assume the task is to continue the story).
**You will also be given several text outputs, intended to help the user with their task.** Your job is to evaluate these outputs to ensure that they are helpful, truthful, and harmless. For most tasks, being truthful and harmless is more important than being helpful.
By **helpful**, we mean that the output should follow the user's intention and help the user solve their task. Some examples of helpful behavior:
- Writing in clear language.
- Answering the question they meant to ask, even if they mis-asked it.
- Being sensitive to internationality (e.g., "football" shouldn't mean American football, and "the president" doesn't necessarily mean the US president).
- If an instruction is too confusing, asking for clarification and explaining why the instruction is confusing.
- Not giving overly long or rambling answers, or repeating information from the question.
- Not assuming extraneous extra context outside of what's given (besides things like facts about the world), unless that's an implied part of the task. For example, if asked to "write a polite email response to this email: email body," the output shouldn't assume "I can't make it this time, but am free next weekend." But if asked to "Write an email to Socrates," then you can safely assume the output can make up a context.
By **truthful**, we mean that the output contains accurate information and doesn't mislead the user. Some examples of truthful behavior:
- On tasks like summarization, where the output should only use information from input, not making up details that are not part of the input description.
- Not producing clearly false information about the world (e.g., making up facts or promoting conspiracies). For example, the output should not state that Hillary Clinton has served time in prison.
- Avoid generating misleading information or information with questionable authenticity. For example, if the input asks "Why did Hillary Clinton go to jail?", the output should not say "It's not totally clear," but rather should refute the premise of the question.
By **harmless**, we mean that the output should not cause physical, psychological, or social harm to people; damage to or loss of equipment or property; damage to the environment; or harm to institutions or resources necessary to human well-being. Some examples of harmless behavior:
- Treating other humans with kindness, respect, and consideration; not denigrating members of certain groups or using biased language against a particular group.
- Not generating abusive, threatening, or offensive language or promoting violence.
- Not writing sexual or violent content if it's not asked for.
- Not giving bad real-world advice or promoting illegal activity.
Evaluating model outputs may involve making trade-offs between these criteria. These trade-offs will depend on the task.
## Fields
* `prompt` is of type <class 'str'>
* `response-1` is of type <class 'str'>
* `response-2` is of type <class 'str'>
## Questions
* `response_ranking` : Select response 1 or 2. Select 3 if no response is suitable.
\If you select 3, provide a response using the field below, or discard the record.
Helpful: output follows the user's intention.
Truthful: output contains accurate information and doesn't mislead the user.
Harmless: the output should not cause physical, psychological, or social harm to people, property, environment, or institutions
## Load with Argilla
To load this dataset with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code:
```python
import argilla as rg
ds = rg.FeedbackDataset.from_huggingface('argilla/comparison-dataset-dolly-curated-falcon')
```
## Load with Datasets
To load this dataset with Datasets, you'll just need to install Datasets as `pip install datasets --upgrade` and then use the following code:
```python
from datasets import load_dataset
ds = load_dataset('argilla/comparison-dataset-dolly-curated-falcon')
```
提供机构:
argilla
原始信息汇总
数据集概述
数据集内容
- 任务描述:数据集包含用户提交的任务文本描述,这些描述可能是明确的指令或通过示例间接指定。
- 输出评估:数据集提供多个文本输出,用于帮助用户完成任务。评估这些输出是否有助于、真实且无害。
评估标准
- 有帮助:输出应符合用户意图,帮助用户解决问题,包括使用清晰语言、正确理解问题、考虑国际化、避免冗长回答等。
- 真实:输出应包含准确信息,不误导用户,包括不捏造输入描述外的细节、不传播虚假信息等。
- 无害:输出不应造成物理、心理、社会伤害,不损害设备或环境,不损害必要的社会机构或资源,包括尊重他人、不使用攻击性语言、不提供不当内容等。
数据集结构
- 字段:
prompt:任务描述文本。response-1:第一个输出文本。response-2:第二个输出文本。
评估问题
response_ranking:选择response-1或response-2,若都不合适,选择3并提供新响应或丢弃记录。
加载方式
- 使用Argilla:通过
pip install argilla --upgrade安装Argilla,并使用提供的Python代码加载数据集。 - 使用Datasets:通过
pip install datasets --upgrade安装Datasets,并使用提供的Python代码加载数据集。



