wyzard-ai/Shefali1
收藏Hugging Face2024-11-25 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/wyzard-ai/Shefali1
下载链接
链接失效反馈官方服务:
资源简介:
---
size_categories: n<1K
tags:
- rlfh
- argilla
- human-feedback
---
# Dataset Card for Shefali1
This dataset has been created with [Argilla](https://github.com/argilla-io/argilla). As shown in the sections below, this dataset can be loaded into your Argilla server as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets).
## Using this dataset with Argilla
To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code:
```python
import argilla as rg
ds = rg.Dataset.from_hub("wyzard-ai/Shefali1")
```
This will load the settings and records from the dataset repository and push them to you Argilla server for exploration and annotation.
## Using this dataset with `datasets`
To load the records of this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code:
```python
from datasets import load_dataset
ds = load_dataset("wyzard-ai/Shefali1")
```
This will only load the records of the dataset, but not the Argilla settings.
## Dataset Structure
This dataset repo contains:
* Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `rg.Dataset.from_hub` and can be loaded independently using the `datasets` library via `load_dataset`.
* The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla.
* A dataset configuration folder conforming to the Argilla dataset format in `.argilla`.
The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**.
### Fields
The **fields** are the features or text of a dataset's records. For example, the 'text' column of a text classification dataset of the 'prompt' column of an instruction following dataset.
| Field Name | Title | Type | Required | Markdown |
| ---------- | ----- | ---- | -------- | -------- |
| instruction | User instruction | text | True | True |
### Questions
The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking.
| Question Name | Title | Type | Required | Description | Values/Labels |
| ------------- | ----- | ---- | -------- | ----------- | ------------- |
| relevance_score | How Relevant is the conversation based upon persona | rating | True | N/A | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] |
| accuracy_score | How accurate is the conversation based upon persona | rating | True | If the model answered something unrelated to Argilla SDK | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] |
| clarity_score | How accurate is the conversation based upon persona | rating | True | This field should be used to report any feedback that can be useful | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] |
| actionable_score | How actionable is the conversation based upon persona | rating | True | This field should be used to report any feedback that can be useful | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] |
| engagement_score | How engaging is the conversation based upon persona | rating | True | This field should be used to report any feedback that can be useful | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] |
| completeness_score | is the conversation complete based upon persona | rating | True | This field should be used to report any feedback that can be useful | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] |
| feedback | feedback | text | True | What do you think can be improved in the given conversation. How good was the conversation as per you? | N/A |
<!-- check length of metadata properties -->
### Metadata
The **metadata** is a dictionary that can be used to provide additional information about the dataset record.
| Metadata Name | Title | Type | Values | Visible for Annotators |
| ------------- | ----- | ---- | ------ | ---------------------- |
| conv_id | Conversation ID | | - | True |
| turn | Conversation Turn | | 0 - 100 | True |
### Data Instances
An example of a dataset instance in Argilla looks as follows:
```json
{
"_server_id": "da78dcef-543c-4f6c-85ea-ec502df4e978",
"fields": {
"instruction": "**user**: Hi\n**assistant**: Hello Shefali! How can I assist you today? If you have any questions about software or need recommendations, feel free to ask.\n**user**: Hey tell me about Collaboration tools \n**assistant**: Collaboration tools are essential for streamlining teamwork and boosting productivity. They help teams stay organized, communicate effectively, and manage projects efficiently, ultimately minimizing the risk of miscommunication and delays. There are various types of collaboration tools, each catering to specific needs:\n\n* **Communication:** Tools like Slack and Microsoft Teams facilitate instant messaging, file sharing, and video conferencing, enabling seamless communication within teams.\n* **Project Management:** Asana, Trello, and Monday.com help manage tasks, track progress, and ensure projects stay on schedule.\n* **Document Collaboration:** Google Workspace and Microsoft 365 allow multiple users to work on documents, spreadsheets, and presentations simultaneously.\n* **File Sharing:** Dropbox and Google Drive provide centralized storage and easy access to files for all team members.\n\nTo recommend the best tools for *you*, could you tell me a bit more about your specific needs and what challenges you\u0027re looking to address? What kind of work does your team do, and what are your primary goals for using collaboration tools? Knowing this will help me narrow down the options and suggest the most suitable solutions.\n"
},
"id": "eb627248-0ee7-4926-a201-4c2af83e1d65",
"metadata": {
"conv_id": "7c70f4b0-8c01-488d-9c19-52650910b120",
"turn": 4
},
"responses": {},
"status": "pending",
"suggestions": {},
"vectors": {}
}
```
While the same record in HuggingFace `datasets` looks as follows:
```json
{
"_server_id": "da78dcef-543c-4f6c-85ea-ec502df4e978",
"accuracy_score.responses": null,
"accuracy_score.responses.status": null,
"accuracy_score.responses.users": null,
"actionable_score.responses": null,
"actionable_score.responses.status": null,
"actionable_score.responses.users": null,
"clarity_score.responses": null,
"clarity_score.responses.status": null,
"clarity_score.responses.users": null,
"completeness_score.responses": null,
"completeness_score.responses.status": null,
"completeness_score.responses.users": null,
"conv_id": "7c70f4b0-8c01-488d-9c19-52650910b120",
"engagement_score.responses": null,
"engagement_score.responses.status": null,
"engagement_score.responses.users": null,
"feedback.responses": null,
"feedback.responses.status": null,
"feedback.responses.users": null,
"id": "eb627248-0ee7-4926-a201-4c2af83e1d65",
"instruction": "**user**: Hi\n**assistant**: Hello Shefali! How can I assist you today? If you have any questions about software or need recommendations, feel free to ask.\n**user**: Hey tell me about Collaboration tools \n**assistant**: Collaboration tools are essential for streamlining teamwork and boosting productivity. They help teams stay organized, communicate effectively, and manage projects efficiently, ultimately minimizing the risk of miscommunication and delays. There are various types of collaboration tools, each catering to specific needs:\n\n* **Communication:** Tools like Slack and Microsoft Teams facilitate instant messaging, file sharing, and video conferencing, enabling seamless communication within teams.\n* **Project Management:** Asana, Trello, and Monday.com help manage tasks, track progress, and ensure projects stay on schedule.\n* **Document Collaboration:** Google Workspace and Microsoft 365 allow multiple users to work on documents, spreadsheets, and presentations simultaneously.\n* **File Sharing:** Dropbox and Google Drive provide centralized storage and easy access to files for all team members.\n\nTo recommend the best tools for *you*, could you tell me a bit more about your specific needs and what challenges you\u0027re looking to address? What kind of work does your team do, and what are your primary goals for using collaboration tools? Knowing this will help me narrow down the options and suggest the most suitable solutions.\n",
"relevance_score.responses": null,
"relevance_score.responses.status": null,
"relevance_score.responses.users": null,
"status": "pending",
"turn": 4
}
```
### Data Splits
The dataset contains a single split, which is `train`.
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation guidelines
Review the user interactions with the chatbot.
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
提供机构:
wyzard-ai



