argilla/llama-2-banking-preference
收藏Hugging Face2023-08-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/argilla/llama-2-banking-preference
下载链接
链接失效反馈官方服务:
资源简介:
---
size_categories: 1K<n<10K
tags:
- rlfh
- argilla
- human-feedback
---
# Dataset Card for llama-2-banking-preference
This dataset has been created with [Argilla](https://docs.argilla.io).
As shown in the sections below, this dataset can be loaded into Argilla as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets).
## Dataset Description
- **Homepage:** https://argilla.io
- **Repository:** https://github.com/argilla-io/argilla
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
This dataset contains:
* A dataset configuration file conforming to the Argilla dataset format named `argilla.yaml`. This configuration file will be used to configure the dataset when using the `FeedbackDataset.from_huggingface` method in Argilla.
* Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `FeedbackDataset.from_huggingface` and can be loaded independently using the `datasets` library via `load_dataset`.
* The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla.
### Load with Argilla
To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code:
```python
import argilla as rg
ds = rg.FeedbackDataset.from_huggingface("argilla/llama-2-banking-preference")
```
### Load with `datasets`
To load this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code:
```python
from datasets import load_dataset
ds = load_dataset("argilla/llama-2-banking-preference")
```
### Supported Tasks and Leaderboards
This dataset can contain [multiple fields, questions and responses](https://docs.argilla.io/en/latest/guides/llms/conceptual_guides/data_model.html) so it can be used for different NLP tasks, depending on the configuration. The dataset structure is described in the [Dataset Structure section](#dataset-structure).
There are no leaderboards associated with this dataset.
### Languages
[More Information Needed]
## Dataset Structure
### Data in Argilla
The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, and **guidelines**.
The **fields** are the dataset records themselves, for the moment just text fields are suppported. These are the ones that will be used to provide responses to the questions.
| Field Name | Title | Type | Required | Markdown |
| ---------- | ----- | ---- | -------- | -------- |
| request | User request | TextField | True | False |
| response-1 | Assistant response 1 | TextField | True | True |
| response-2 | Assistant response 2 | TextField | True | True |
The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, single choice, or multiple choice.
| Question Name | Title | Type | Required | Description | Values/Labels |
| ------------- | ----- | ---- | -------- | ----------- | ------------- |
| preference | Order response based on your preference | RankingQuestion | True | 1 = most accurate and helpful, 2 = worst. Ties are allowed. | ['response-1', 'response-2'] |
| correct-response | If both responses are innacurate, complete the response: | TextQuestion | True | N/A | N/A |
**✨ NEW** Additionally, we also have **suggestions**, which are linked to the existing questions, and so on, named appending "-suggestion" and "-suggestion-metadata" to those, containing the value/s of the suggestion and its metadata, respectively. So on, the possible values are the same as in the table above.
Finally, the **guidelines** are just a plain string that can be used to provide instructions to the annotators. Find those in the [annotation guidelines](#annotation-guidelines) section.
### Data Instances
An example of a dataset instance in Argilla looks as follows:
```json
{
"external_id": null,
"fields": {
"request": "Do I get charged if i am getting money?",
"response-1": " Of course! Thank you for reaching out to us at the Bank of Tillamook. We\u0027re here to help you with any questions or concerns you may have about your bank account. When it comes to fees associated with withdrawals, we do charge a small fee for certain types of transactions. However, please note that there is no charge for accessing your own funds in our bank through one of our ATMs or by using your debit card for purchases. If you could provide me with more details on what kind of transaction you were trying to make, I would be happy to assist you further and explain any potential fees associated with those services. Is there anything else I can help you with today?",
"response-2": " Of course! Thank you for choosing the Bank of Tillamook for your financial needs. We appreciate your business and value your trust in us. To answer your question, no, you will not be charged for withdrawing money from your own account at the Bank of Tillamook. Our bank is committed to providing convenient and fee-free services to our customers, so you can access your funds without worrying about additional charges. If you have any further questions or concerns, please don\u0027t hesitate to ask. Is there anything else I can help you with today?"
},
"id": null,
"metadata": {},
"responses": [],
"suggestions": []
}
```
While the same record in HuggingFace `datasets` looks as follows:
```json
{
"correct-response": null,
"correct-response-suggestion": null,
"correct-response-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
},
"external_id": null,
"metadata": null,
"preference": null,
"preference-suggestion": null,
"preference-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
},
"request": "Do I get charged if i am getting money?",
"response-1": " Of course! Thank you for reaching out to us at the Bank of Tillamook. We\u0027re here to help you with any questions or concerns you may have about your bank account. When it comes to fees associated with withdrawals, we do charge a small fee for certain types of transactions. However, please note that there is no charge for accessing your own funds in our bank through one of our ATMs or by using your debit card for purchases. If you could provide me with more details on what kind of transaction you were trying to make, I would be happy to assist you further and explain any potential fees associated with those services. Is there anything else I can help you with today?",
"response-2": " Of course! Thank you for choosing the Bank of Tillamook for your financial needs. We appreciate your business and value your trust in us. To answer your question, no, you will not be charged for withdrawing money from your own account at the Bank of Tillamook. Our bank is committed to providing convenient and fee-free services to our customers, so you can access your funds without worrying about additional charges. If you have any further questions or concerns, please don\u0027t hesitate to ask. Is there anything else I can help you with today?"
}
```
### Data Fields
Among the dataset fields, we differentiate between the following:
* **Fields:** These are the dataset records themselves, for the moment just text fields are suppported. These are the ones that will be used to provide responses to the questions.
* **request** is of type `TextField`.
* **response-1** is of type `TextField`.
* **response-2** is of type `TextField`.
* **Questions:** These are the questions that will be asked to the annotators. They can be of different types, such as `RatingQuestion`, `TextQuestion`, `LabelQuestion`, `MultiLabelQuestion`, and `RankingQuestion`.
* **preference** is of type `RankingQuestion` with the following allowed values ['response-1', 'response-2'], and description "1 = most accurate and helpful, 2 = worst. Ties are allowed.".
* (optional) **correct-response** is of type `TextQuestion`.
* **✨ NEW** **Suggestions:** As of Argilla 1.13.0, the suggestions have been included to provide the annotators with suggestions to ease or assist during the annotation process. Suggestions are linked to the existing questions, are always optional, and contain not just the suggestion itself, but also the metadata linked to it, if applicable.
* (optional) **preference-suggestion** is of type `ranking` with the following allowed values ['response-1', 'response-2'].
* (optional) **correct-response-suggestion** is of type `text`.
Additionally, we also have one more field which is optional and is the following:
* **external_id:** This is an optional field that can be used to provide an external ID for the dataset record. This can be useful if you want to link the dataset record to an external resource, such as a database or a file.
### Data Splits
The dataset contains a single split, which is `train`.
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation guidelines
Please, read the question carefully and try to answer it as accurately as possible.
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
提供机构:
argilla
原始信息汇总
数据集概述
数据集名称
- 名称: llama-2-banking-preference
数据集大小
- 大小: 1K<n<10K
标签
- 标签: rlfh, argilla, human-feedback
数据集描述
- 配置文件: 包含一个名为
argilla.yaml的配置文件,符合Argilla数据集格式。 - 记录格式: 数据集记录与HuggingFace
datasets兼容,可通过FeedbackDataset.from_huggingface或datasets库加载。 - 注释指南: 若在Argilla中定义,包含注释指南。
加载方式
-
使用Argilla加载: 通过安装Argilla并使用以下代码加载数据集: python import argilla as rg ds = rg.FeedbackDataset.from_huggingface("argilla/llama-2-banking-preference")
-
使用
datasets加载: 通过安装datasets并使用以下代码加载数据集: python from datasets import load_dataset ds = load_dataset("argilla/llama-2-banking-preference")
数据集结构
- 字段:
- request: 用户请求,类型为
TextField。 - response-1: 助手响应1,类型为
TextField。 - response-2: 助手响应2,类型为
TextField。
- request: 用户请求,类型为
- 问题:
- preference: 偏好排序问题,类型为
RankingQuestion,允许值为[response-1, response-2]。 - correct-response: 文本问题,类型为
TextQuestion。
- preference: 偏好排序问题,类型为
- 建议:
- preference-suggestion: 类型为
ranking,允许值为[response-1, response-2]。 - correct-response-suggestion: 类型为
text。
- preference-suggestion: 类型为
- 外部ID: 可选字段,用于提供数据集记录的外部ID。
数据集创建
- 注释指南: 请仔细阅读问题并尽可能准确地回答。
- 数据分割: 数据集包含单个分割,即
train。



