achinthani/test-2
收藏Hugging Face2024-03-17 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/achinthani/test-2
下载链接
链接失效反馈官方服务:
资源简介:
---
size_categories: n<1K
tags:
- rlfh
- argilla
- human-feedback
---
# Dataset Card for test-2
This dataset has been created with [Argilla](https://docs.argilla.io).
As shown in the sections below, this dataset can be loaded into Argilla as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets).
## Dataset Description
- **Homepage:** https://argilla.io
- **Repository:** https://github.com/argilla-io/argilla
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
This dataset contains:
* A dataset configuration file conforming to the Argilla dataset format named `argilla.yaml`. This configuration file will be used to configure the dataset when using the `FeedbackDataset.from_huggingface` method in Argilla.
* Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `FeedbackDataset.from_huggingface` and can be loaded independently using the `datasets` library via `load_dataset`.
* The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla.
### Load with Argilla
To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code:
```python
import argilla as rg
ds = rg.FeedbackDataset.from_huggingface("achinthani/test-2")
```
### Load with `datasets`
To load this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code:
```python
from datasets import load_dataset
ds = load_dataset("achinthani/test-2")
```
### Supported Tasks and Leaderboards
This dataset can contain [multiple fields, questions and responses](https://docs.argilla.io/en/latest/conceptual_guides/data_model.html#feedback-dataset) so it can be used for different NLP tasks, depending on the configuration. The dataset structure is described in the [Dataset Structure section](#dataset-structure).
There are no leaderboards associated with this dataset.
### Languages
[More Information Needed]
## Dataset Structure
### Data in Argilla
The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**.
The **fields** are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions.
| Field Name | Title | Type | Required | Markdown |
| ---------- | ----- | ---- | -------- | -------- |
| text | Text | text | True | False |
The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking.
| Question Name | Title | Type | Required | Description | Values/Labels |
| ------------- | ----- | ---- | -------- | ----------- | ------------- |
| sentiment | Sentiment | label_selection | True | N/A | ['positive', 'neutral', 'negative'] |
| mixed-emotion | Mixed-emotion | multi_label_selection | True | N/A | ['joy', 'anger', 'sadness', 'fear', 'surprise', 'love'] |
| ranking | Ranking | ranking | True | N/A | ['1', '2', '3', '4', '5'] |
| rating | Rating | rating | True | N/A | [1, 2, 3, 4, 5] |
| text-annotation | Feedback | text | True | N/A | N/A |
The **suggestions** are human or machine generated recommendations for each question to assist the annotator during the annotation process, so those are always linked to the existing questions, and named appending "-suggestion" and "-suggestion-metadata" to those, containing the value/s of the suggestion and its metadata, respectively. So on, the possible values are the same as in the table above, but the column name is appended with "-suggestion" and the metadata is appended with "-suggestion-metadata".
The **metadata** is a dictionary that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`.
| Metadata Name | Title | Type | Values | Visible for Annotators |
| ------------- | ----- | ---- | ------ | ---------------------- |
The **guidelines**, are optional as well, and are just a plain string that can be used to provide instructions to the annotators. Find those in the [annotation guidelines](#annotation-guidelines) section.
### Data Instances
An example of a dataset instance in Argilla looks as follows:
```json
{
"external_id": null,
"fields": {
"text": "Absolutely infuriated by the lack of accountability in our government. It\u0027s time for real change!"
},
"metadata": {},
"responses": [],
"suggestions": [],
"vectors": {}
}
```
While the same record in HuggingFace `datasets` looks as follows:
```json
{
"external_id": null,
"metadata": "{}",
"mixed-emotion": [],
"mixed-emotion-suggestion": null,
"mixed-emotion-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
},
"ranking": [],
"ranking-suggestion": null,
"ranking-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
},
"rating": [],
"rating-suggestion": null,
"rating-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
},
"sentiment": [],
"sentiment-suggestion": null,
"sentiment-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
},
"text": "Absolutely infuriated by the lack of accountability in our government. It\u0027s time for real change!",
"text-annotation": [],
"text-annotation-suggestion": null,
"text-annotation-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
}
}
```
### Data Fields
Among the dataset fields, we differentiate between the following:
* **Fields:** These are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions.
* **text** is of type `text`.
* **Questions:** These are the questions that will be asked to the annotators. They can be of different types, such as `RatingQuestion`, `TextQuestion`, `LabelQuestion`, `MultiLabelQuestion`, and `RankingQuestion`.
* **sentiment** is of type `label_selection` with the following allowed values ['positive', 'neutral', 'negative'].
* **mixed-emotion** is of type `multi_label_selection` with the following allowed values ['joy', 'anger', 'sadness', 'fear', 'surprise', 'love'].
* **ranking** is of type `ranking` with the following allowed values ['1', '2', '3', '4', '5'].
* **rating** is of type `rating` with the following allowed values [1, 2, 3, 4, 5].
* **text-annotation** is of type `text`.
* **Suggestions:** As of Argilla 1.13.0, the suggestions have been included to provide the annotators with suggestions to ease or assist during the annotation process. Suggestions are linked to the existing questions, are always optional, and contain not just the suggestion itself, but also the metadata linked to it, if applicable.
* (optional) **sentiment-suggestion** is of type `label_selection` with the following allowed values ['positive', 'neutral', 'negative'].
* (optional) **mixed-emotion-suggestion** is of type `multi_label_selection` with the following allowed values ['joy', 'anger', 'sadness', 'fear', 'surprise', 'love'].
* (optional) **ranking-suggestion** is of type `ranking` with the following allowed values ['1', '2', '3', '4', '5'].
* (optional) **rating-suggestion** is of type `rating` with the following allowed values [1, 2, 3, 4, 5].
* (optional) **text-annotation-suggestion** is of type `text`.
Additionally, we also have two more fields that are optional and are the following:
* **metadata:** This is an optional field that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`.
* **external_id:** This is an optional field that can be used to provide an external ID for the dataset record. This can be useful if you want to link the dataset record to an external resource, such as a database or a file.
### Data Splits
The dataset contains a single split, which is `train`.
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation guidelines
Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise.
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
提供机构:
achinthani
原始信息汇总
数据集概述
数据集名称
- 名称: test-2
数据集大小
- 大小分类: n<1K
标签
- 标签: rlfh, argilla, human-feedback
数据集描述
- 配置文件: 包含一个符合Argilla数据集格式的配置文件
argilla.yaml。 - 记录格式: 与HuggingFace
datasets兼容的记录格式。 - 加载方式: 支持通过Argilla和
datasets库加载。
数据集结构
- 字段: 支持文本字段。
- 问题: 包括情感、混合情感、排名、评分和文本注释等类型的问题。
- 建议: 为每个问题提供人或机器生成的建议。
- 元数据: 提供额外信息,如数据来源或作者信息。
- 指南: 提供注释指南。
数据实例
- 示例: 包含文本字段和相关问题的响应。
数据字段
- 文本字段: 仅支持文本类型。
- 问题字段: 包括标签选择、多标签选择、排名和文本类型。
- 建议字段: 与问题关联,提供注释辅助。
- 元数据字段: 提供额外信息。
- 外部ID字段: 提供外部资源链接。
数据分割
- 分割: 仅包含训练集。
数据集创建
- 注释指南: 描述了六种基本情感的标注。
语言
- 语言: 待补充信息。



