test-argilla-dataset

Name: test-argilla-dataset
Creator: maas
Published: 2025-10-09 16:28:59
License: 暂无描述

魔搭社区2025-10-09 更新2025-04-12 收录

下载链接：

https://modelscope.cn/datasets/burtenshaw/test-argilla-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for test-argilla-dataset This dataset has been created with [Argilla](https://github.com/argilla-io/argilla). As shown in the sections below, this dataset can be loaded into your Argilla server as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets). ## Using this dataset with Argilla To load with Argilla, you'll just need to install Argilla as `pip install argilla --pre --upgrade` and then use the following code: ```python import argilla as rg ds = rg.Dataset.from_hub("burtenshaw/test-argilla-dataset") ``` This will load the settings and records from the dataset repository and push them to you Argilla server for exploration and annotation. ## Using this dataset with `datasets` To load the records of this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset ds = load_dataset("burtenshaw/test-argilla-dataset") ``` This will only load the records of the dataset, but not the Argilla settings. ## Dataset Structure This dataset repo contains: * Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `rg.Dataset.from_hub` and can be loaded independently using the `datasets` library via `load_dataset`. * The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla. * A dataset configuration folder conforming to the Argilla dataset format in `.argilla`. The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**. ### Fields The **fields** are the features or text of a dataset's records. For example, the 'text' column of a text classification dataset of the 'prompt' column of an instruction following dataset. | Field Name | Title | Type | Required | Markdown | | ---------- | ----- | ---- | -------- | -------- | | text | text | text | True | False | ### Questions The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking. | Question Name | Title | Type | Required | Description | Values/Labels | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | label | label | label_selection | True | N/A | ['positive', 'negative'] | | rating | rating | rating | True | N/A | [1, 2, 3, 4, 5] | | ranking | ranking | ranking | True | N/A | ['label1', 'label2', 'label3'] | | comment | comment | text | True | N/A | N/A | | topics | topics | multi_label_selection | True | N/A | ['topic1', 'topic2', 'topic3'] | | span | span | span | True | N/A | N/A | ### Metadata The **metadata** is a dictionary that can be used to provide additional information about the dataset record. | Metadata Name | Title | Type | Values | Visible for Annotators | | ------------- | ----- | ---- | ------ | ---------------------- | | comment_score | comment_score | | None - None | True | ### Vectors The **vectors** contain a vector representation of the record that can be used in search. | Vector Name | Title | Dimensions | |-------------|-------|------------| | vector | vector | [1, 3] | ### Data Instances An example of a dataset instance in Argilla looks as follows: ```json { "_server_id": "8aaf57d2-cb8e-4673-a7ce-2f684b60adf5", "fields": { "text": "Hello World, how are you?" }, "id": "4f56e32b-9582-47de-a2b1-b230732bb07b", "metadata": {}, "responses": { "label": [ { "user_id": "06f7d4c0-e048-43d2-ab3f-06f147616ac6", "value": "positive" } ] }, "suggestions": { "label": { "agent": null, "score": null, "value": "positive" }, "topics": { "agent": null, "score": [ 0.9, 0.8 ], "value": [ "topic1", "topic2" ] } }, "vectors": {} } ``` While the same record in HuggingFace `datasets` looks as follows: ```json { "_server_id": "8aaf57d2-cb8e-4673-a7ce-2f684b60adf5", "comment.suggestion": null, "comment.suggestion.agent": null, "comment.suggestion.score": null, "comment_score": null, "id": "4f56e32b-9582-47de-a2b1-b230732bb07b", "label.responses": [ "positive" ], "label.responses.status": [ "draft" ], "label.responses.users": [ "06f7d4c0-e048-43d2-ab3f-06f147616ac6" ], "label.suggestion": "positive", "label.suggestion.agent": null, "label.suggestion.score": null, "ranking.suggestion": null, "ranking.suggestion.agent": null, "ranking.suggestion.score": null, "rating.suggestion": null, "rating.suggestion.agent": null, "rating.suggestion.score": null, "span.suggestion": null, "span.suggestion.agent": null, "span.suggestion.score": null, "text": "Hello World, how are you?", "topics.suggestion": [ "topic1", "topic2" ], "topics.suggestion.agent": null, "topics.suggestion.score": [ 0.9, 0.8 ], "vector": null } ``` ### Data Splits The dataset contains a single split, which is `train`. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation guidelines [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]

# test-argilla-dataset 数据集卡片本数据集基于 [Argilla](https://github.com/argilla-io/argilla) 构建。如下文所述，该数据集既可按照[通过Argilla加载](#load-with-argilla)的步骤导入您的Argilla服务器，也可通过[通过datasets库加载](#load-with-datasets)直接配合`datasets`库使用。 ## 在Argilla中使用本数据集若要通过Argilla加载该数据集，只需先执行`pip install argilla --pre --upgrade`安装Argilla，随后运行以下代码： python import argilla as rg ds = rg.Dataset.from_hub("burtenshaw/test-argilla-dataset") 该操作会从数据集仓库加载配置与记录，并推送至您的Argilla服务器，以供探索与标注使用。 ## 通过`datasets`库使用本数据集若要通过`datasets`库加载本数据集的记录，只需先执行`pip install datasets --upgrade`安装`datasets`库，随后运行以下代码： python from datasets import load_dataset ds = load_dataset("burtenshaw/test-argilla-dataset") 该操作仅会加载数据集的记录，不会加载Argilla相关配置。 ## 数据集结构本数据集仓库包含以下内容： * 兼容HuggingFace `datasets`格式的数据集记录。使用`rg.Dataset.from_hub`时会自动加载此类记录，也可通过`datasets`库的`load_dataset`方法独立加载。 * 构建与整理数据集时使用的[标注指南](#annotation-guidelines)（若已在Argilla中定义）。 * 符合Argilla数据集格式的`.argilla`数据集配置文件夹。本数据集在Argilla中基于以下要素构建：**字段（fields）**、**问题（questions）**、**建议（suggestions）**、**元数据（metadata）**、**向量（vectors）**以及**指南（guidelines）**。 ### 字段（Fields） **字段（fields）**指数据集记录的特征或文本内容。例如，文本分类数据集中的`text`列，或是指令遵循数据集中的`prompt`列。 | 字段名称 | 标题 | 类型 | 是否必填 | 支持Markdown | | ---------- | ----- | ---- | -------- | -------- | | text | 文本 | 文本类型 | 是 | 否 | ### 问题（Questions） **问题（questions）**指向标注人员提出的标注任务，支持多种类型，包括评分（rating）、文本输入（text）、标签选择（label_selection）、多标签选择（multi_label_selection）以及排序（ranking）。 | 问题名称 | 标题 | 类型 | 是否必填 | 描述 | 取值/标签 | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | label | 标签 | 标签选择（label_selection） | 是 | 无 | ['positive', 'negative'] | | rating | 评分 | 评分（rating） | 是 | 无 | [1, 2, 3, 4, 5] | | ranking | 排序 | 排序（ranking） | 是 | 无 | ['label1', 'label2', 'label3'] | | comment | 评论 | 文本输入（text） | 是 | 无 | 无 | | topics | 主题 | 多标签选择（multi_label_selection） | 是 | 无 | ['topic1', 'topic2', 'topic3'] | | span | 片段 | 片段标注（span） | 是 | 无 | 无 | ### 元数据（Metadata） **元数据（metadata）**是用于提供数据集记录额外信息的字典。 | 元数据名称 | 标题 | 类型 | 取值 | 标注人员可见 | | ------------- | ----- | ---- | ------ | ---------------------- | | comment_score | 评论评分 | | 无 - 无 | 是 | ### 向量（Vectors） **向量（vectors）**包含可用于搜索的记录向量表示。 | 向量名称 | 标题 | 维度 | |-------------|-------|------------| | vector | 向量 | [1, 3] | ### 数据实例本数据集在Argilla中的一条示例记录格式如下： json { "_server_id": "8aaf57d2-cb8e-4673-a7ce-2f684b60adf5", "fields": { "text": "Hello World, how are you?" }, "id": "4f56e32b-9582-47de-a2b1-b230732bb07b", "metadata": {}, "responses": { "label": [ { "user_id": "06f7d4c0-e048-43d2-ab3f-06f147616ac6", "value": "positive" } ] }, "suggestions": { "label": { "agent": null, "score": null, "value": "positive" }, "topics": { "agent": null, "score": [ 0.9, 0.8 ], "value": [ "topic1", "topic2" ] } }, "vectors": {} } 而该记录在HuggingFace `datasets`库中的格式如下： json { "_server_id": "8aaf57d2-cb8e-4673-a7ce-2f684b60adf5", "comment.suggestion": null, "comment.suggestion.agent": null, "comment.suggestion.score": null, "comment_score": null, "id": "4f56e32b-9582-47de-a2b1-b230732bb07b", "label.responses": [ "positive" ], "label.responses.status": [ "draft" ], "label.responses.users": [ "06f7d4c0-e048-43d2-ab3f-06f147616ac6" ], "label.suggestion": "positive", "label.suggestion.agent": null, "label.suggestion.score": null, "ranking.suggestion": null, "ranking.suggestion.agent": null, "ranking.suggestion.score": null, "rating.suggestion": null, "rating.suggestion.agent": null, "rating.suggestion.score": null, "span.suggestion": null, "span.suggestion.agent": null, "span.suggestion.score": null, "text": "Hello World, how are you?", "topics.suggestion": [ "topic1", "topic2" ], "topics.suggestion.agent": null, "topics.suggestion.score": [ 0.9, 0.8 ], "vector": null } ### 数据划分本数据集仅包含一个划分，即`train`（训练集）。 ## 数据集创建 ### 标注依据 [需补充更多信息] ### 源数据 #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生产者是谁？ [需补充更多信息] ### 标注信息 #### 标注指南 [需补充更多信息] #### 标注流程 [需补充更多信息] #### 标注人员是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知限制 [需补充更多信息] ## 附加信息 ### 数据集整理者 [需补充更多信息] ### 许可信息 [需补充更多信息] ### 引用信息 [需补充更多信息] ### 贡献 [需补充更多信息]

提供机构：

maas

创建时间：

2025-04-07

搜集汇总

数据集介绍