five

vegeta/testargilla

收藏
Hugging Face2023-06-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/vegeta/testargilla
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: n<1K tags: - rlfh - argilla - human-feedback dataset_info: features: - name: metadata dtype: string - name: text dtype: string id: field - name: label dtype: string id: field - name: question-1 sequence: - name: user_id dtype: string - name: value dtype: string - name: status dtype: string id: question - name: question-2 sequence: - name: user_id dtype: string - name: value dtype: int32 - name: status dtype: string id: question - name: external_id dtype: string id: external_id splits: - name: train num_bytes: 148 num_examples: 1 download_size: 0 dataset_size: 148 --- # Dataset Card for testargilla This dataset has been created with [Argilla](https://docs.argilla.io). As shown in the sections below, this dataset can be loaded into Argilla as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets). ## Dataset Description - **Homepage:** https://argilla.io - **Repository:** https://github.com/argilla-io/argilla - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary This dataset contains: * A dataset configuration file conforming to the Argilla dataset format named `argilla.cfg`. This configuration file will be used to configure the dataset when using the `FeedbackDataset.from_huggingface` method in Argilla. * Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `FeedbackDataset.from_huggingface` and can be loaded independently using the `datasets` library via `load_dataset`. * The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla. ### Load with Argilla To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code: ```python import argilla as rg ds = rg.FeedbackDataset.from_huggingface("vegeta/testargilla") ``` ### Load with `datasets` To load this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset ds = load_dataset("vegeta/testargilla") ``` ### Supported Tasks and Leaderboards This dataset can contain [multiple fields, questions and responses](https://docs.argilla.io/en/latest/guides/llms/conceptual_guides/data_model.html) so it can be used for different NLP tasks, depending on the configuration. The dataset structure is described in the [Dataset Structure section](#dataset-structure). There are no leaderboards associated with this dataset. ### Languages [More Information Needed] ## Dataset Structure ### Data in Argilla The dataset is created in Argilla with: **fields**, **questions**, and **guidelines**. The **fields** are the dataset records themselves, for the moment just text fields are suppported. These are the ones that will be used to provide responses to the questions. | Field Name | Title | Type | Required | Markdown | | ---------- | ----- | ---- | -------- | -------- | | text | Text | TextField | True | False | | label | Label | TextField | True | False | The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, single choice, or multiple choice. | Question Name | Title | Type | Required | Description | Values/Labels | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | question-1 | Question-1 | TextQuestion | True | This is the first question | N/A | | question-2 | Question-2 | RatingQuestion | True | This is the second question | [1, 2, 3, 4, 5] | Finally, the **guidelines** are just a plain string that can be used to provide instructions to the annotators. Find those in the [annotation guidelines](#annotation-guidelines) section. ### Data Instances An example of a dataset instance in Argilla looks as follows: ```json { "external_id": "entry-1", "fields": { "label": "positive", "text": "This is the first record" }, "metadata": null, "responses": [ { "status": "submitted", "user_id": null, "values": { "question-1": { "value": "This is the first answer" }, "question-2": { "value": 5 } } } ] } ``` While the same record in HuggingFace `datasets` looks as follows: ```json { "external_id": "entry-1", "label": "positive", "metadata": null, "question-1": { "status": [ "submitted" ], "user_id": [ null ], "value": [ "This is the first answer" ] }, "question-2": { "status": [ "submitted" ], "user_id": [ null ], "value": [ 5 ] }, "text": "This is the first record" } ``` ### Data Fields Among the dataset fields, we differentiate between the following: * **Fields:** These are the dataset records themselves, for the moment just text fields are suppported. These are the ones that will be used to provide responses to the questions. * **text** is of type `TextField`. * **label** is of type `TextField`. * **Questions:** These are the questions that will be asked to the annotators. They can be of different types, such as rating, text, single choice, or multiple choice. * **question-1** is of type `TextQuestion`, and description "This is the first question". * **question-2** is of type `RatingQuestion` with the following allowed values [1, 2, 3, 4, 5], and description "This is the second question". Additionally, we also have one more field which is optional and is the following: * **external_id:** This is an optional field that can be used to provide an external ID for the dataset record. This can be useful if you want to link the dataset record to an external resource, such as a database or a file. ### Data Splits The dataset contains a single split, which is `train`. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation guidelines These are the annotation guidelines. #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]
提供机构:
vegeta
原始信息汇总

数据集概述

数据集信息

  • 大小分类: n<1K
  • 标签:
    • rlfh
    • argilla
    • human-feedback

数据集特征

  • metadata: 数据类型为字符串
  • text: 数据类型为字符串,标识为field
  • label: 数据类型为字符串,标识为field
  • question-1: 序列类型,包含以下字段:
    • user_id: 数据类型为字符串
    • value: 数据类型为字符串
    • status: 数据类型为字符串,标识为question
  • question-2: 序列类型,包含以下字段:
    • user_id: 数据类型为字符串
    • value: 数据类型为int32
    • status: 数据类型为字符串,标识为question
  • external_id: 数据类型为字符串,标识为external_id

数据集分割

  • train: 数据大小为148字节,包含1个示例

数据集加载

  • 可通过Argilla或datasets库加载

数据集结构

  • 字段:
    • text: 文本字段,必填
    • label: 文本字段,必填
  • 问题:
    • question-1: 文本问题,必填
    • question-2: 评分问题,必填,允许值为[1, 2, 3, 4, 5]
  • 外部ID: 可选字段,用于提供外部ID

数据实例

  • 示例包括外部ID、字段(标签和文本)、问题响应等

数据字段

  • 字段: 文本字段,用于提供问题响应
  • 问题: 不同类型的问题,如文本、评分等
  • 外部ID: 可选,用于链接外部资源
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作