MPEP_RUSSIAN
收藏魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/data-is-better-together/MPEP_RUSSIAN
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for MPEP_RUSSIAN
This dataset has been created with [Argilla](https://docs.argilla.io).
As shown in the sections below, this dataset can be loaded into Argilla as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets).
## Dataset Description
- **Homepage:** https://huggingface.co/DIBT-Russian
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:** https://huggingface.co/spaces/ZennyKenny/kghamilton
### Dataset Summary
This dataset contains:
* A dataset configuration file conforming to the Argilla dataset format named `argilla.yaml`. This configuration file will be used to configure the dataset when using the `FeedbackDataset.from_huggingface` method in Argilla.
* Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `FeedbackDataset.from_huggingface` and can be loaded independently using the `datasets` library via `load_dataset`.
* The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla.
### Load with Argilla
To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code:
```python
import argilla as rg
ds = rg.FeedbackDataset.from_huggingface("DIBT/MPEP_RUSSIAN")
```
### Load with `datasets`
To load this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code:
```python
from datasets import load_dataset
ds = load_dataset("DIBT/MPEP_RUSSIAN")
```
### Supported Tasks and Leaderboards
This dataset can contain [multiple fields, questions and responses](https://docs.argilla.io/en/latest/conceptual_guides/data_model.html#feedback-dataset) so it can be used for different NLP tasks, depending on the configuration. The dataset structure is described in the [Dataset Structure section](#dataset-structure).
There are no leaderboards associated with this dataset.
### Languages
[More Information Needed]
## Dataset Structure
### Data in Argilla
The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**.
The **fields** are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions.
| Field Name | Title | Type | Required | Markdown |
| ---------- | ----- | ---- | -------- | -------- |
| source | Source | text | True | True |
The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking.
| Question Name | Title | Type | Required | Description | Values/Labels |
| ------------- | ----- | ---- | -------- | ----------- | ------------- |
| target | Target | text | True | Translate the text. | N/A |
The **suggestions** are human or machine generated recommendations for each question to assist the annotator during the annotation process, so those are always linked to the existing questions, and named appending "-suggestion" and "-suggestion-metadata" to those, containing the value/s of the suggestion and its metadata, respectively. So on, the possible values are the same as in the table above, but the column name is appended with "-suggestion" and the metadata is appended with "-suggestion-metadata".
The **metadata** is a dictionary that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`.
| Metadata Name | Title | Type | Values | Visible for Annotators |
| ------------- | ----- | ---- | ------ | ---------------------- |
The **guidelines**, are optional as well, and are just a plain string that can be used to provide instructions to the annotators. Find those in the [annotation guidelines](#annotation-guidelines) section.
### Data Instances
An example of a dataset instance in Argilla looks as follows:
```json
{
"external_id": "165",
"fields": {
"source": "Given the text: An experienced and enthusiastic innovator...you want on your team.\nMargaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee.\nMargaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses.\nPrior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies.\nSummarize the background and expertise of Margaret Hines, the founder of Inspire Marketing."
},
"metadata": {
"evolved_from": null,
"kind": "synthetic",
"source": "ultrachat"
},
"responses": [
{
"status": "discarded",
"user_id": "633168bb-7483-4d46-b1a2-f9a3eef38a7c",
"values": {
"target": {
"value": "\u0423\u0447\u0438\u0442\u044b\u0432\u0430\u044f \u0442\u0435\u043a\u0441\u0442: \u041e\u043f\u044b\u0442\u043d\u044b\u0439 \u0438 \u044d\u043d\u0442\u0443\u0437\u0438\u0430\u0441\u0442\u0438\u0447\u043d\u044b\u0439 \u043d\u043e\u0432\u0430\u0442\u043e\u0440... \u0432\u044b \u0445\u043e\u0442\u0438\u0442\u0435 \u0432 \u0441\u0432\u043e\u0435\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u0435. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0438 \u0433\u043b\u0430\u0432\u043d\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u043d\u0442\u043e\u043c Inspire Marketing, LLC, \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u044e\u0449\u0435\u0439 \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f, \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u044e\u0449\u0435\u0439 \u0441\u043e\u043e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0431\u0438\u0437\u043d\u0435\u0441-\u0431\u0440\u043e\u043a\u0435\u0440\u043e\u043c \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u043e\u0432\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435\u043c. \u041e\u043d\u0430 \u0438\u043c\u0435\u0435\u0442 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u0431\u0430\u043a\u0430\u043b\u0430\u0432\u0440\u0430 \u0432 \u0412\u0430\u0448\u0438\u043d\u0433\u0442\u043e\u043d\u0441\u043a\u043e\u043c \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0432 \u0421\u0435\u043d\u0442-\u041b\u0443\u0438\u0441\u0435, \u0448\u0442\u0430\u0442 \u041c\u043e\u0441\u043a\u0432\u0430, \u0438 MBA \u0438\u0437 \u0423\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0430 \u0412\u0438\u0441\u043a\u043e\u043d\u0441\u0438\u043d\u043a\u0430-\u041c\u0438\u043b\u0432\u0430\u043a\u0438. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u0442 \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u0446\u0438\u0438 \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0430, \u043f\u0440\u043e\u0434\u0430\u0436 \u0431\u0438\u0437\u043d\u0435\u0441\u0430 \u0438 \u043f\u043e\u0432\u043e\u0440\u043e\u0442\u043e\u0432 \u0438 \u0444\u0440\u0430\u043d\u0447\u0430\u0439\u0437\u0438\u043d\u0433\u0430."
}
}
},
{
"status": "submitted",
"user_id": "0982e39f-758c-4022-863c-7831af244eba",
"values": {
"target": {
"value": "\u0412 \u0442\u0435\u043a\u0441\u0442\u0435 \u0443\u043a\u0430\u0437\u0430\u043d\u043e: \u041e\u043f\u044b\u0442\u043d\u044b\u0439 \u0438 \u0443\u0432\u043b\u0435\u0447\u0435\u043d\u043d\u044b\u0439 \u043d\u043e\u0432\u0430\u0442\u043e\u0440...\u0432\u044b \u0445\u043e\u0442\u0438\u0442\u0435, \u0447\u0442\u043e\u0431\u044b \u043e\u043d \u0431\u044b\u043b \u0432 \u0432\u0430\u0448\u0435\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u0435.\n\n\u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441 - \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u044c \u0438 \u0433\u043b\u0430\u0432\u043d\u044b\u0439 \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u043d\u0442 Inspire Marketing, LLC, \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u044e\u0449\u0430\u044f \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f, \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u044e\u0449\u0430\u044f \u043e\u0431\u0449\u0435\u0441\u0442\u0432\u0443 \u0443\u0441\u043b\u0443\u0433\u0438 \u0431\u0438\u0437\u043d\u0435\u0441-\u0431\u0440\u043e\u043a\u0435\u0440\u0430 \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u043e\u0432\u043e\u0433\u043e \u043a\u043e\u043d\u0441\u0430\u043b\u0442\u0438\u043d\u0433\u0430. \u041e\u043d\u0430 \u043f\u043e\u043b\u0443\u0447\u0438\u043b\u0430 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u0431\u0430\u043a\u0430\u043b\u0430\u0432\u0440\u0430 \u0432 \u0412\u0430\u0448\u0438\u043d\u0433\u0442\u043e\u043d\u0441\u043a\u043e\u043c \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0432 \u0421\u0435\u043d\u0442-\u041b\u0443\u0438\u0441\u0435, \u0448\u0442\u0430\u0442 \u041c\u0438\u0441\u0441\u0443\u0440\u0438, \u0438 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u043c\u0430\u0433\u0438\u0441\u0442\u0440\u0430 \u0434\u0435\u043b\u043e\u0432\u043e\u0433\u043e \u0430\u0434\u043c\u0438\u043d\u0438\u0441\u0442\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u0423\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0412\u0438\u0441\u043a\u043e\u043d\u0441\u0438\u043d-\u041c\u0438\u043b\u0443\u043e\u043a\u0438.\n\n\u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0437\u0430\u043d\u0438\u043c\u0430\u0435\u0442\u0441\u044f \u043a\u043e\u043d\u0441\u0430\u043b\u0442\u0438\u043d\u0433\u043e\u043c \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0430, \u043f\u0440\u043e\u0434\u0430\u0436 \u0438 \u0444\u0440\u0430\u043d\u0447\u0430\u0439\u0437\u0438\u043d\u0433\u0430. \u041e\u043d\u0430 \u0442\u0430\u043a\u0436\u0435 \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u0435\u0442 \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f.\n\n\u041f\u0440\u0435\u0436\u0434\u0435 \u0447\u0435\u043c \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u044c Inspire Marketing \u0432 2003 \u0433\u043e\u0434\u0443, \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u043f\u0440\u0438\u043e\u0431\u0440\u0435\u043b\u0430 \u0434\u0435\u043b\u043e\u0432\u0443\u044e \u0445\u0432\u0430\u0442\u043a\u0443, \u043e\u043f\u044b\u0442 \u0432 \u043f\u0440\u043e\u0434\u0430\u0436\u0430\u0445 \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0435, \u0440\u0430\u0431\u043e\u0442\u0430\u044f \u0432 \u0443\u0432\u0430\u0436\u0430\u0435\u043c\u044b\u0445 \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u044f\u0445 \u0438\u0437 \u0441\u043f\u0438\u0441\u043a\u0430 Fortune 1000.\n\u041a\u0440\u0430\u0442\u043a\u043e \u043e\u0431 \u0438\u0441\u0442\u043e\u0440\u0438\u0438 \u0438 \u043e\u043f\u044b\u0442\u0435 \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441, \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u043d\u0438\u0446\u044b Inspire Marketing.\n"
}
}
}
],
"suggestions": [
{
"agent": null,
"question_name": "target",
"score": null,
"type": null,
"value": "\u0423\u0447\u0438\u0442\u044b\u0432\u0430\u044f \u0442\u0435\u043a\u0441\u0442: \u041e\u043f\u044b\u0442\u043d\u044b\u0439 \u0438 \u044d\u043d\u0442\u0443\u0437\u0438\u0430\u0441\u0442\u0438\u0447\u043d\u044b\u0439 \u043d\u043e\u0432\u0430\u0442\u043e\u0440... \u0432\u044b \u0445\u043e\u0442\u0438\u0442\u0435 \u0432 \u0441\u0432\u043e\u0435\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u0435. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0438 \u0433\u043b\u0430\u0432\u043d\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u043d\u0442\u043e\u043c Inspire Marketing, LLC, \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u044e\u0449\u0435\u0439 \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f, \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u044e\u0449\u0435\u0439 \u0441\u043e\u043e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0431\u0438\u0437\u043d\u0435\u0441-\u0431\u0440\u043e\u043a\u0435\u0440\u043e\u043c \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u043e\u0432\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435\u043c. \u041e\u043d\u0430 \u0438\u043c\u0435\u0435\u0442 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u0431\u0430\u043a\u0430\u043b\u0430\u0432\u0440\u0430 \u0432 \u0412\u0430\u0448\u0438\u043d\u0433\u0442\u043e\u043d\u0441\u043a\u043e\u043c \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0432 \u0421\u0435\u043d\u0442-\u041b\u0443\u0438\u0441\u0435, \u0448\u0442\u0430\u0442 \u041c\u043e\u0441\u043a\u0432\u0430, \u0438 MBA \u0438\u0437 \u0423\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0430 \u0412\u0438\u0441\u043a\u043e\u043d\u0441\u0438\u043d\u043a\u0430-\u041c\u0438\u043b\u0432\u0430\u043a\u0438. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u0442 \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u0446\u0438\u0438 \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0430, \u043f\u0440\u043e\u0434\u0430\u0436 \u0431\u0438\u0437\u043d\u0435\u0441\u0430 \u0438 \u043f\u043e\u0432\u043e\u0440\u043e\u0442\u043e\u0432 \u0438 \u0444\u0440\u0430\u043d\u0447\u0430\u0439\u0437\u0438\u043d\u0433\u0430."
}
],
"vectors": {}
}
```
While the same record in HuggingFace `datasets` looks as follows:
```json
{
"external_id": "165",
"metadata": "{\"source\": \"ultrachat\", \"kind\": \"synthetic\", \"evolved_from\": null}",
"source": "Given the text: An experienced and enthusiastic innovator...you want on your team.\nMargaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee.\nMargaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses.\nPrior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies.\nSummarize the background and expertise of Margaret Hines, the founder of Inspire Marketing.",
"target": [
{
"status": "discarded",
"user_id": "633168bb-7483-4d46-b1a2-f9a3eef38a7c",
"value": "\u0423\u0447\u0438\u0442\u044b\u0432\u0430\u044f \u0442\u0435\u043a\u0441\u0442: \u041e\u043f\u044b\u0442\u043d\u044b\u0439 \u0438 \u044d\u043d\u0442\u0443\u0437\u0438\u0430\u0441\u0442\u0438\u0447\u043d\u044b\u0439 \u043d\u043e\u0432\u0430\u0442\u043e\u0440... \u0432\u044b \u0445\u043e\u0442\u0438\u0442\u0435 \u0432 \u0441\u0432\u043e\u0435\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u0435. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0438 \u0433\u043b\u0430\u0432\u043d\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u043d\u0442\u043e\u043c Inspire Marketing, LLC, \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u044e\u0449\u0435\u0439 \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f, \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u044e\u0449\u0435\u0439 \u0441\u043e\u043e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0431\u0438\u0437\u043d\u0435\u0441-\u0431\u0440\u043e\u043a\u0435\u0440\u043e\u043c \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u043e\u0432\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435\u043c. \u041e\u043d\u0430 \u0438\u043c\u0435\u0435\u0442 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u0431\u0430\u043a\u0430\u043b\u0430\u0432\u0440\u0430 \u0432 \u0412\u0430\u0448\u0438\u043d\u0433\u0442\u043e\u043d\u0441\u043a\u043e\u043c \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0432 \u0421\u0435\u043d\u0442-\u041b\u0443\u0438\u0441\u0435, \u0448\u0442\u0430\u0442 \u041c\u043e\u0441\u043a\u0432\u0430, \u0438 MBA \u0438\u0437 \u0423\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0430 \u0412\u0438\u0441\u043a\u043e\u043d\u0441\u0438\u043d\u043a\u0430-\u041c\u0438\u043b\u0432\u0430\u043a\u0438. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u0442 \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u0446\u0438\u0438 \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0430, \u043f\u0440\u043e\u0434\u0430\u0436 \u0431\u0438\u0437\u043d\u0435\u0441\u0430 \u0438 \u043f\u043e\u0432\u043e\u0440\u043e\u0442\u043e\u0432 \u0438 \u0444\u0440\u0430\u043d\u0447\u0430\u0439\u0437\u0438\u043d\u0433\u0430."
},
{
"status": "submitted",
"user_id": "0982e39f-758c-4022-863c-7831af244eba",
"value": "\u0412 \u0442\u0435\u043a\u0441\u0442\u0435 \u0443\u043a\u0430\u0437\u0430\u043d\u043e: \u041e\u043f\u044b\u0442\u043d\u044b\u0439 \u0438 \u0443\u0432\u043b\u0435\u0447\u0435\u043d\u043d\u044b\u0439 \u043d\u043e\u0432\u0430\u0442\u043e\u0440...\u0432\u044b \u0445\u043e\u0442\u0438\u0442\u0435, \u0447\u0442\u043e\u0431\u044b \u043e\u043d \u0431\u044b\u043b \u0432 \u0432\u0430\u0448\u0435\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u0435.\n\n\u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441 - \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u044c \u0438 \u0433\u043b\u0430\u0432\u043d\u044b\u0439 \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u043d\u0442 Inspire Marketing, LLC, \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u044e\u0449\u0430\u044f \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f, \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u044e\u0449\u0430\u044f \u043e\u0431\u0449\u0435\u0441\u0442\u0432\u0443 \u0443\u0441\u043b\u0443\u0433\u0438 \u0431\u0438\u0437\u043d\u0435\u0441-\u0431\u0440\u043e\u043a\u0435\u0440\u0430 \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u043e\u0432\u043e\u0433\u043e \u043a\u043e\u043d\u0441\u0430\u043b\u0442\u0438\u043d\u0433\u0430. \u041e\u043d\u0430 \u043f\u043e\u043b\u0443\u0447\u0438\u043b\u0430 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u0431\u0430\u043a\u0430\u043b\u0430\u0432\u0440\u0430 \u0432 \u0412\u0430\u0448\u0438\u043d\u0433\u0442\u043e\u043d\u0441\u043a\u043e\u043c \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0432 \u0421\u0435\u043d\u0442-\u041b\u0443\u0438\u0441\u0435, \u0448\u0442\u0430\u0442 \u041c\u0438\u0441\u0441\u0443\u0440\u0438, \u0438 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u043c\u0430\u0433\u0438\u0441\u0442\u0440\u0430 \u0434\u0435\u043b\u043e\u0432\u043e\u0433\u043e \u0430\u0434\u043c\u0438\u043d\u0438\u0441\u0442\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0432 \u0423\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0412\u0438\u0441\u043a\u043e\u043d\u0441\u0438\u043d-\u041c\u0438\u043b\u0443\u043e\u043a\u0438.\n\n\u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0437\u0430\u043d\u0438\u043c\u0430\u0435\u0442\u0441\u044f \u043a\u043e\u043d\u0441\u0430\u043b\u0442\u0438\u043d\u0433\u043e\u043c \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0430, \u043f\u0440\u043e\u0434\u0430\u0436 \u0438 \u0444\u0440\u0430\u043d\u0447\u0430\u0439\u0437\u0438\u043d\u0433\u0430. \u041e\u043d\u0430 \u0442\u0430\u043a\u0436\u0435 \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u0435\u0442 \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f.\n\n\u041f\u0440\u0435\u0436\u0434\u0435 \u0447\u0435\u043c \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u044c Inspire Marketing \u0432 2003 \u0433\u043e\u0434\u0443, \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u043f\u0440\u0438\u043e\u0431\u0440\u0435\u043b\u0430 \u0434\u0435\u043b\u043e\u0432\u0443\u044e \u0445\u0432\u0430\u0442\u043a\u0443, \u043e\u043f\u044b\u0442 \u0432 \u043f\u0440\u043e\u0434\u0430\u0436\u0430\u0445 \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0435, \u0440\u0430\u0431\u043e\u0442\u0430\u044f \u0432 \u0443\u0432\u0430\u0436\u0430\u0435\u043c\u044b\u0445 \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u044f\u0445 \u0438\u0437 \u0441\u043f\u0438\u0441\u043a\u0430 Fortune 1000.\n\u041a\u0440\u0430\u0442\u043a\u043e \u043e\u0431 \u0438\u0441\u0442\u043e\u0440\u0438\u0438 \u0438 \u043e\u043f\u044b\u0442\u0435 \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441, \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u043d\u0438\u0446\u044b Inspire Marketing.\n"
}
],
"target-suggestion": "\u0423\u0447\u0438\u0442\u044b\u0432\u0430\u044f \u0442\u0435\u043a\u0441\u0442: \u041e\u043f\u044b\u0442\u043d\u044b\u0439 \u0438 \u044d\u043d\u0442\u0443\u0437\u0438\u0430\u0441\u0442\u0438\u0447\u043d\u044b\u0439 \u043d\u043e\u0432\u0430\u0442\u043e\u0440... \u0432\u044b \u0445\u043e\u0442\u0438\u0442\u0435 \u0432 \u0441\u0432\u043e\u0435\u0439 \u043a\u043e\u043c\u0430\u043d\u0434\u0435. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u0425\u0430\u0439\u043d\u0441 \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043e\u0441\u043d\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0438 \u0433\u043b\u0430\u0432\u043d\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u043d\u0442\u043e\u043c Inspire Marketing, LLC, \u0438\u043d\u0432\u0435\u0441\u0442\u0438\u0440\u0443\u044e\u0449\u0435\u0439 \u0432 \u043c\u0435\u0441\u0442\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f, \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u044e\u0449\u0435\u0439 \u0441\u043e\u043e\u0431\u0449\u0435\u0441\u0442\u0432\u043e \u0431\u0438\u0437\u043d\u0435\u0441-\u0431\u0440\u043e\u043a\u0435\u0440\u043e\u043c \u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u043e\u0432\u044b\u043c \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435\u043c. \u041e\u043d\u0430 \u0438\u043c\u0435\u0435\u0442 \u0441\u0442\u0435\u043f\u0435\u043d\u044c \u0431\u0430\u043a\u0430\u043b\u0430\u0432\u0440\u0430 \u0432 \u0412\u0430\u0448\u0438\u043d\u0433\u0442\u043e\u043d\u0441\u043a\u043e\u043c \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0435 \u0432 \u0421\u0435\u043d\u0442-\u041b\u0443\u0438\u0441\u0435, \u0448\u0442\u0430\u0442 \u041c\u043e\u0441\u043a\u0432\u0430, \u0438 MBA \u0438\u0437 \u0423\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0430 \u0412\u0438\u0441\u043a\u043e\u043d\u0441\u0438\u043d\u043a\u0430-\u041c\u0438\u043b\u0432\u0430\u043a\u0438. \u041c\u0430\u0440\u0433\u0430\u0440\u0435\u0442 \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u0442 \u043a\u043e\u043d\u0441\u0443\u043b\u044c\u0442\u0430\u0446\u0438\u0438 \u0432 \u043e\u0431\u043b\u0430\u0441\u0442\u0438 \u043c\u0430\u0440\u043a\u0435\u0442\u0438\u043d\u0433\u0430, \u043f\u0440\u043e\u0434\u0430\u0436 \u0431\u0438\u0437\u043d\u0435\u0441\u0430 \u0438 \u043f\u043e\u0432\u043e\u0440\u043e\u0442\u043e\u0432 \u0438 \u0444\u0440\u0430\u043d\u0447\u0430\u0439\u0437\u0438\u043d\u0433\u0430.",
"target-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
}
}
```
### Data Fields
Among the dataset fields, we differentiate between the following:
* **Fields:** These are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions.
* **source** is of type `text`.
* **Questions:** These are the questions that will be asked to the annotators. They can be of different types, such as `RatingQuestion`, `TextQuestion`, `LabelQuestion`, `MultiLabelQuestion`, and `RankingQuestion`.
* **target** is of type `text`, and description "Translate the text.".
* **Suggestions:** As of Argilla 1.13.0, the suggestions have been included to provide the annotators with suggestions to ease or assist during the annotation process. Suggestions are linked to the existing questions, are always optional, and contain not just the suggestion itself, but also the metadata linked to it, if applicable.
* (optional) **target-suggestion** is of type `text`.
Additionally, we also have two more fields that are optional and are the following:
* **metadata:** This is an optional field that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`.
* **external_id:** This is an optional field that can be used to provide an external ID for the dataset record. This can be useful if you want to link the dataset record to an external resource, such as a database or a file.
### Data Splits
The dataset contains a single split, which is `train`.
## Dataset Creation
### Curation Rationale
Prompts were selected based on community assessment through the [Prompt Collective initiative](https://huggingface.co/spaces/DIBT/prompt-collective-dashboard).
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation guidelines
This is a translation dataset that contains texts. Please translate the text in the text field.
#### Annotation process
Translators were both native and non-native Russian language speakers who used their own language and cultural knowledge to provide the best Russian-language versions of the provided prompts. Sometimes this meant replacing certain geographical or cultural references with those that would be more familiar to a Russian-speaking audience, sometimes this mean leaving Latin-alphabet words with no Russian-language equivalent "as is", sometimes this meant transliterating English-language words. Annotators were asked to use their best judgement whilst translating.
#### Who are the annotators?
A list of contributing annotators is available on the DIBT-Russian [MPEP Dashboard](https://huggingface.co/spaces/DIBT-Russian/MPEP_Dashboard)
### Personal and Sensitive Information
No personal or sensitive information was included in any of the source or translated prompts.
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
# MPEP_RUSSIAN 数据集卡片
本数据集基于 [Argilla](https://docs.argilla.io) 构建。
如下文各章节所述,本数据集可按照[使用Argilla加载](#load-with-argilla)中的说明加载至Argilla,或直接结合`datasets`库按照[使用datasets加载](#load-with-datasets)中的方式使用。
## 数据集描述
- **主页**:https://huggingface.co/DIBT-Russian
- **代码仓库**:无
- **相关论文**:无
- **排行榜**:无
- **联系方式**:https://huggingface.co/spaces/ZennyKenny/kghamilton
### 数据集概述
本数据集包含以下内容:
1. 符合Argilla数据集格式的数据集配置文件`argilla.yaml`。在Argilla中使用`FeedbackDataset.from_huggingface`方法时,将通过该配置文件完成数据集的初始化配置。
2. 兼容HuggingFace `datasets`库格式的数据集记录。在调用`FeedbackDataset.from_huggingface`时,这些记录将自动加载;也可通过`datasets`库的`load_dataset`函数独立加载。
3. 若已在Argilla中完成定义,则包含用于数据集构建与整理的[标注指南](#annotation-guidelines)。
### 使用Argilla加载
若要使用Argilla加载本数据集,只需通过`pip install argilla --upgrade`安装Argilla,随后运行如下代码:
python
import argilla as rg
ds = rg.FeedbackDataset.from_huggingface("DIBT/MPEP_RUSSIAN")
### 使用datasets加载
若要结合`datasets`库加载本数据集,只需通过`pip install datasets --upgrade`安装`datasets`库,随后运行如下代码:
python
from datasets import load_dataset
ds = load_dataset("DIBT/MPEP_RUSSIAN")
### 支持的任务与排行榜
本数据集包含[多字段、多问题与多回复](https://docs.argilla.io/en/latest/conceptual_guides/data_model.html#feedback-dataset),因此可根据配置情况适配多种自然语言处理(NLP)任务。数据集结构详见[数据集结构章节](#dataset-structure)。本数据集暂无关联排行榜。
### 语言
[需补充更多信息]
## 数据集结构
### Argilla中的数据结构
本数据集在Argilla中由以下模块构成:**字段(fields)**、**问题(questions)**、**建议(suggestions)**、**元数据(metadata)**、**向量(vectors)**以及**指南(guidelines)**。
其中**字段**即数据集记录本身,目前仅支持文本字段类型,用于为对应问题提供回复内容。
| 字段名称 | 标题 | 类型 | 是否必填 | 支持Markdown |
| ------ | ---- | ---- | ------ | -------- |
| source | 源文本 | text | 是 | 是 |
**问题**即向标注人员提出的标注任务问题,支持多种类型,包括评分题、文本题、单项选择题、多项选择题以及排序题。
| 问题名称 | 标题 | 类型 | 是否必填 | 问题描述 | 可选值/标签 |
| ------ | ---- | ---- | ------ | ------ | --------- |
| target | 目标文本 | text | 是 | 请翻译该文本。 | 无(N/A) |
**建议**指为辅助标注人员完成标注流程,针对每个问题生成的人工或机器推荐结果。建议始终与对应问题绑定,命名格式为在问题名称后追加`-suggestion`(存储建议内容)与`-suggestion-metadata`(存储建议元数据)。其可选值类型与上述问题表格一致,但列名需追加上述后缀。
**元数据**为用于存储数据集记录附加信息的字典结构,可用于向标注人员提供额外上下文,或记录数据集记录自身的补充信息(例如数据集记录的原始来源链接、作者、创建日期等)。元数据为可选字段,可与`argilla.yaml`数据集配置文件中定义的`metadata_properties`建立关联。
| 元数据名称 | 标题 | 类型 | 可选值 | 标注人员可见性 |
| ------ | ---- | ---- | ------ | ---------- |
(本数据集暂无预设元数据项)
**指南**同样为可选字段,是用于向标注人员提供标注说明的纯文本内容,详见[标注指南](#annotation-guidelines)章节。
### 数据实例
本数据集在Argilla中的一条示例数据如下所示:
json
{
"external_id": "165",
"fields": {
"source": "Given the text: An experienced and enthusiastic innovator...you want on your team.
Margaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee.
Margaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses.
Prior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies.
Summarize the background and expertise of Margaret Hines, the founder of Inspire Marketing."
},
"metadata": {
"evolved_from": null,
"kind": "synthetic",
"source": "ultrachat"
},
"responses": [
{
"status": "discarded",
"user_id": "633168bb-7483-4d46-b1a2-f9a3eef38a7c",
"values": {
"target": {
"value": "(俄文原文保留原样)"
}
}
},
{
"status": "submitted",
"user_id": "0982e39f-758c-4022-863c-7831af244eba",
"values": {
"target": {
"value": "(俄文原文保留原样)"
}
}
}
],
"suggestions": [
{
"agent": null,
"question_name": "target",
"score": null,
"type": null,
"value": "(俄文原文保留原样)"
}
],
"vectors": {}
}
而在HuggingFace `datasets`库中的同一条数据示例如下:
json
{
"external_id": "165",
"metadata": "{"source": "ultrachat", "kind": "synthetic", "evolved_from": null}",
"source": "Given the text: An experienced and enthusiastic innovator...you want on your team.
Margaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee.
Margaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses.
Prior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies.
Summarize the background and expertise of Margaret Hines, the founder of Inspire Marketing.",
"target": [
{
"status": "discarded",
"user_id": "633168bb-7483-4d46-b1a2-f9a3eef38a7c",
"value": "(俄文原文保留原样)"
},
{
"status": "submitted",
"user_id": "0982e39f-758c-4022-863c-7831af244eba",
"value": "(俄文原文保留原样)"
}
],
"target-suggestion": "(俄文原文保留原样)",
"target-suggestion-metadata": {
"agent": null,
"score": null,
"type": null
}
}
### 数据字段
本数据集的字段可分为以下几类:
* **字段(Fields)**:即数据集记录本身,目前仅支持文本字段类型,用于为对应问题提供回复内容。
* **source(源文本)**:类型为`text`。
* **问题(Questions)**:即向标注人员提出的标注任务问题,支持多种类型,包括`RatingQuestion`(评分问题)、`TextQuestion`(文本问题)、`LabelQuestion`(单项选择问题)、`MultiLabelQuestion`(多项选择问题)以及`RankingQuestion`(排序问题)。
* **target(目标文本)**:类型为`text`,问题描述为“请翻译该文本。”
* **建议(Suggestions)**:自Argilla 1.13.0版本起,新增建议模块用于向标注人员提供辅助内容,以简化标注流程。建议与对应问题绑定,为可选字段,不仅包含建议内容本身,还可附带关联的元数据(若有)。
* (可选)**target-suggestion(目标文本-建议)**:类型为`text`。
此外,本数据集还包含以下两个可选字段:
* **元数据(metadata)**:可选字段,用于存储数据集记录的附加信息,可向标注人员提供额外上下文,或记录数据集记录自身的补充信息(例如原始来源链接、作者、创建日期等),可与`argilla.yaml`配置文件中定义的`metadata_properties`建立关联。
* **external_id(外部ID)**:可选字段,用于为数据集记录分配外部标识,可用于将数据集记录与外部资源(例如数据库或文件)建立关联。
### 数据划分
本数据集仅包含一个划分,即`train`(训练集)。
## 数据集构建
### 整理依据
本数据集的提示词基于[Prompt Collective项目](https://huggingface.co/spaces/DIBT/prompt-collective-dashboard)的社区评估结果筛选得到。
### 源数据
#### 初始数据收集与标准化
[需补充更多信息]
#### 源文本创作者是谁?
[需补充更多信息]
### 标注信息
#### 标注指南
本数据集为文本翻译数据集,请将文本字段中的源文本翻译为俄语。
#### 标注流程
本次标注的翻译人员包含俄语母语者与非母语者,他们将结合自身的语言与文化知识,为给定的提示词生成最优俄语译版。标注过程中,翻译人员可根据俄语受众的认知习惯调整部分地理或文化指代;对于俄语中无对应译法的拉丁字母词汇,可保留原词;对于英文词汇则可采用音译方式。标注人员需基于自身判断完成翻译工作。
#### 标注人员构成
参与本次标注的标注人员名单可在DIBT-Russian的[MPEP数据看板](https://huggingface.co/spaces/DIBT-Russian/MPEP_Dashboard)中查看。
### 个人与敏感信息
本数据集的源文本与翻译文本均未包含任何个人或敏感信息。
## 数据使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差说明
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 补充信息
### 数据集整理者
[需补充更多信息]
### 授权信息
[需补充更多信息]
### 引用信息
[需补充更多信息]
### 贡献者
[需补充更多信息]
提供机构:
maas
创建时间:
2025-07-10



