dvilasuero/prompt-collective-backup

Name: dvilasuero/prompt-collective-backup
Creator: dvilasuero
Published: 2024-02-19 12:13:01
License: 暂无描述

Hugging Face2024-02-19 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/dvilasuero/prompt-collective-backup

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含一个符合Argilla数据集格式的配置文件`argilla.yaml`，以及兼容HuggingFace `datasets`库的记录。数据集主要用于评估提示（prompt）的质量，涉及对提示的清晰度、复杂性和趣味性进行评分。数据集可以通过Argilla或HuggingFace的`datasets`库加载。数据集包含一个名为`prompt`的文本字段，以及一个名为`quality`的标签选择问题，用于对提示的质量进行评分。数据集还包含可选的元数据和外部ID字段。

This dataset contains a configuration file `argilla.yaml` that complies with the Argilla dataset format, alongside records compatible with the Hugging Face `datasets` library. It is primarily designed for prompt quality evaluation, which involves scoring prompts across clarity, complexity, and engagement metrics. The dataset can be loaded via either the Argilla platform or the Hugging Face `datasets` library. It includes a text field named `prompt`, and a multiple-choice labeling question named `quality` for scoring prompt quality. Optional metadata and external ID fields are also supported in this dataset.

提供机构：

dvilasuero

原始信息汇总

数据集卡片 for prompt-collective-backup

数据集描述

数据集摘要

该数据集包含：

符合 Argilla 数据集格式的配置文件 argilla.yaml。
兼容 HuggingFace datasets 格式的数据记录。
用于构建和整理数据集的标注指南（如果已在 Argilla 中定义）。

加载方式

使用 Argilla 加载

python import argilla as rg

ds = rg.FeedbackDataset.from_huggingface("dvilasuero/prompt-collective-backup")

使用 `datasets` 加载

python from datasets import load_dataset

ds = load_dataset("dvilasuero/prompt-collective-backup")

支持的任务和排行榜

该数据集可用于不同的 NLP 任务，具体取决于配置。数据集结构在数据集结构部分描述。

该数据集没有关联的排行榜。

语言

[更多信息需要]

数据集结构

数据在 Argilla 中

数据集在 Argilla 中包含以下内容：

fields：数据记录本身，目前仅支持文本字段。
questions：向标注者提出的问题，可以是不同类型，如评分、文本、标签选择、多标签选择或排序。
suggestions：人类或机器生成的建议，以协助标注者在标注过程中的选择。
metadata：提供有关数据记录的额外信息的字典。
vectors：[更多信息需要]
guidelines：提供给标注者的可选指令。

数据实例

在 Argilla 中的数据实例示例如下：

json { "external_id": null, "fields": { "prompt": "Provide step-by-step instructions on how to make a safe and effective homemade all-purpose cleaner from common household ingredients. The guide should include measurements, tips for storing the cleaner, and additional variations or scents that can be added. Additionally, the guide should be written in clear and concise language, with helpful visuals or photographs to aid in the process." }, "metadata": { "evolved_from": null, "kind": "synthetic", "source": "ultrachat" }, "responses": [], "suggestions": [], "vectors": {} }

在 HuggingFace datasets 中的相同记录示例如下：

json { "external_id": null, "metadata": "{"source": "ultrachat", "kind": "synthetic", "evolved_from": null}", "prompt": "Provide step-by-step instructions on how to make a safe and effective homemade all-purpose cleaner from common household ingredients. The guide should include measurements, tips for storing the cleaner, and additional variations or scents that can be added. Additionally, the guide should be written in clear and concise language, with helpful visuals or photographs to aid in the process.", "quality": [], "quality-suggestion": null, "quality-suggestion-metadata": { "agent": null, "score": null, "type": null } }

数据字段

数据集字段包括：

Fields：数据记录本身，目前仅支持文本字段。
- prompt：类型为 text。
Questions：向标注者提出的问题，可以是不同类型。
- quality：类型为 label_selection，允许值为 [0, 1, 2, 3, 4]。
Suggestions：提供给标注者的建议，与现有问题相关联，总是可选的。
- (可选) quality-suggestion：类型为 label_selection，允许值为 [0, 1, 2, 3, 4]。

此外，还有两个可选字段：

metadata：提供有关数据记录的额外信息的字典。
external_id：提供数据记录的外部 ID。

数据分割

数据集包含一个分割，即 train。

数据集创建

标注指南

我们的目标是识别有效的提示并理解 AI 生成和人类生成提示之间的交互。重点是评估清晰、有趣且复杂的提示，以微调开源大型语言模型。

一个好的提示应该：

用户意图清晰。
对助手来说具有挑战性或有趣，涉及解决复杂问题、推理等。

评分

1. 非常差：

提示没有传达其目的，无意义或使用非英语语言。

2. 差：

提出了一个目标，但缺乏清晰性和连贯性。

3. 一般：

意图可以理解，但缺少完成任务的信息。

4. 好：

提出了清晰的目标和必要信息，有效地指导 AI，但提示可以更具体。

5. 非常好：

全面且明确，没有歧义。完美地指导 AI 并包含细节。

5,000+

优质数据集

54 个

任务类型

进入经典数据集