five

MPEP_GREEK

收藏
魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/data-is-better-together/MPEP_GREEK
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for MPEP_GREEK This dataset has been created with [Argilla](https://docs.argilla.io). As shown in the sections below, this dataset can be loaded into Argilla as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets). ## Dataset Description - **Homepage:** https://argilla.io - **Repository:** https://github.com/argilla-io/argilla - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary This dataset contains: * A dataset configuration file conforming to the Argilla dataset format named `argilla.yaml`. This configuration file will be used to configure the dataset when using the `FeedbackDataset.from_huggingface` method in Argilla. * Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `FeedbackDataset.from_huggingface` and can be loaded independently using the `datasets` library via `load_dataset`. * The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla. ### Load with Argilla To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code: ```python import argilla as rg ds = rg.FeedbackDataset.from_huggingface("DIBT/MPEP_GREEK") ``` ### Load with `datasets` To load this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset ds = load_dataset("DIBT/MPEP_GREEK") ``` ### Supported Tasks and Leaderboards This dataset can contain [multiple fields, questions and responses](https://docs.argilla.io/en/latest/conceptual_guides/data_model.html#feedback-dataset) so it can be used for different NLP tasks, depending on the configuration. The dataset structure is described in the [Dataset Structure section](#dataset-structure). There are no leaderboards associated with this dataset. ### Languages [More Information Needed] ## Dataset Structure ### Data in Argilla The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, **vectors**, and **guidelines**. The **fields** are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions. | Field Name | Title | Type | Required | Markdown | | ---------- | ----- | ---- | -------- | -------- | | source | Source | text | True | True | The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking. | Question Name | Title | Type | Required | Description | Values/Labels | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | target | Target | text | True | Translate the text. | N/A | The **suggestions** are human or machine generated recommendations for each question to assist the annotator during the annotation process, so those are always linked to the existing questions, and named appending "-suggestion" and "-suggestion-metadata" to those, containing the value/s of the suggestion and its metadata, respectively. So on, the possible values are the same as in the table above, but the column name is appended with "-suggestion" and the metadata is appended with "-suggestion-metadata". The **metadata** is a dictionary that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`. | Metadata Name | Title | Type | Values | Visible for Annotators | | ------------- | ----- | ---- | ------ | ---------------------- | The **guidelines**, are optional as well, and are just a plain string that can be used to provide instructions to the annotators. Find those in the [annotation guidelines](#annotation-guidelines) section. ### Data Instances An example of a dataset instance in Argilla looks as follows: ```json { "external_id": "888", "fields": { "source": "Given the text: An experienced and enthusiastic innovator...you want on your team.\nMargaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee.\nMargaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses.\nPrior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies.\nSummarize the background and expertise of Margaret Hines, the founder of Inspire Marketing." }, "metadata": { "evolved_from": null, "kind": "synthetic", "source": "ultrachat" }, "responses": [ { "status": "submitted", "user_id": "f4d8878d-e378-4087-a99b-c31dad5f0609", "values": { "target": { "value": "\u0392\u03ac\u03c3\u03b5\u03b9 \u03c4\u03bf\u03c5 \u03ba\u03b5\u03b9\u03bc\u03ad\u03bd\u03bf\u03c5: \u039c\u03af\u03b1 \u03ad\u03bc\u03c0\u03b5\u03b9\u03c1\u03b7 \u03ba\u03b1\u03b9 \u03b5\u03bd\u03b8\u03bf\u03c5\u03c3\u03b9\u03ce\u03b4\u03b7\u03c2 \u03ba\u03b1\u03b9\u03bd\u03bf\u03c4\u03cc\u03bc\u03bf\u03c2... \u03c0\u03bf\u03c5 \u03b8\u03ad\u03bb\u03b5\u03c4\u03b5 \u03c3\u03c4\u03b7\u03bd \u03bf\u03bc\u03ac\u03b4\u03b1 \u03c3\u03b1\u03c2.\n\u0397 Margaret Hines \u03b5\u03af\u03bd\u03b1\u03b9 \u03b7 \u03b9\u03b4\u03c1\u03cd\u03c4\u03c1\u03b9\u03b1 \u03ba\u03b1\u03b9 \u03b7 \u03ba\u03cd\u03c1\u03b9\u03b1 \u03c3\u03cd\u03bc\u03b2\u03bf\u03c5\u03bb\u03bf\u03c2 \u03c4\u03b7\u03c2 Inspire Marketing, LLC, \u03ad\u03c7\u03bf\u03bd\u03c4\u03b1\u03c2 \u03b5\u03c0\u03b5\u03bd\u03b4\u03cd\u03c3\u03b5\u03b9 \u03c3\u03b5 \u03c4\u03bf\u03c0\u03b9\u03ba\u03ad\u03c2 \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03ae\u03c3\u03b5\u03b9\u03c2, \u03b5\u03be\u03c5\u03c0\u03b7\u03c1\u03b5\u03c4\u03ce\u03bd\u03c4\u03b1\u03c2 \u03c4\u03b7\u03bd \u03ba\u03bf\u03b9\u03bd\u03cc\u03c4\u03b7\u03c4\u03b1 \u03bc\u03ad\u03c3\u03c9 \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03b7\u03bc\u03b1\u03c4\u03b9\u03ba\u03ae\u03c2 \u03bc\u03b5\u03c3\u03b9\u03c4\u03b5\u03af\u03b1\u03c2 \u03ba\u03b1\u03b9 \u03c3\u03c5\u03bc\u03b2\u03bf\u03c5\u03bb\u03ce\u03bd \u03bc\u03ac\u03c1\u03ba\u03b5\u03c4\u03b9\u03bd\u03b3\u03ba. \u0388\u03c7\u03b5\u03b9 \u03c0\u03c4\u03c5\u03c7\u03af\u03bf \u03b1\u03c0\u03cc \u03c4\u03bf \u03a0\u03b1\u03bd\u03b5\u03c0\u03b9\u03c3\u03c4\u03ae\u03bc\u03b9\u03bf \u03c4\u03b7\u03c2 \u039f\u03c5\u03ac\u03c3\u03b9\u03b3\u03ba\u03c4\u03bf\u03bd \u03c3\u03c4\u03bf St. Louis, MO, \u03ba\u03b1\u03b9 MBA \u03b1\u03c0\u03cc \u03c4\u03bf \u03a0\u03b1\u03bd\u03b5\u03c0\u03b9\u03c3\u03c4\u03ae\u03bc\u03b9\u03bf \u03c4\u03bf\u03c5 Wisconsin-Milwaukee.\n\u0397 Margaret \u03c0\u03c1\u03bf\u03c3\u03c6\u03ad\u03c1\u03b5\u03b9 \u03c3\u03c5\u03bc\u03b2\u03bf\u03c5\u03bb\u03ad\u03c2 \u03c3\u03b5 \u03b8\u03ad\u03bc\u03b1\u03c4\u03b1 \u03bc\u03ac\u03c1\u03ba\u03b5\u03c4\u03b9\u03bd\u03b3\u03ba, \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03b7\u03bc\u03b1\u03c4\u03b9\u03ba\u03ce\u03bd \u03c0\u03c9\u03bb\u03ae\u03c3\u03b5\u03c9\u03bd \u03ba\u03b1\u03b9 \u03b1\u03bd\u03b1\u03ba\u03b1\u03c4\u03b1\u03c3\u03ba\u03b5\u03c5\u03ce\u03bd \u03ba\u03b1\u03b9 franchising. \u0395\u03af\u03bd\u03b1\u03b9 \u03b5\u03c0\u03af\u03c3\u03b7\u03c2 \u03b5\u03c0\u03b5\u03bd\u03b4\u03cd\u03c4\u03c1\u03b9\u03b1 \u03c3\u03b5 \u03c4\u03bf\u03c0\u03b9\u03ba\u03ad\u03c2 \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03ae\u03c3\u03b5\u03b9\u03c2.\n\u03a0\u03c1\u03b9\u03bd \u03b1\u03c0\u03cc \u03c4\u03b7\u03bd \u03af\u03b4\u03c1\u03c5\u03c3\u03b7 \u03c4\u03b7\u03c2 Inspire Marketing \u03c4\u03bf 2003, \u03b7 Margaret \u03b1\u03c0\u03ad\u03ba\u03c4\u03b7\u03c3\u03b5 \u03c4\u03b7\u03bd \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03b7\u03bc\u03b1\u03c4\u03b9\u03ba\u03ae \u03c4\u03b7\u03c2 \u03bf\u03be\u03c5\u03b4\u03ad\u03c1\u03ba\u03b5\u03b9\u03b1, \u03ba\u03b1\u03b9 \u03c4\u03b7\u03bd \u03c4\u03b5\u03c7\u03bd\u03bf\u03b3\u03bd\u03c9\u03c3\u03af\u03b1 \u03c4\u03b7\u03c2 \u03c3\u03c4\u03b9\u03c2 \u03c0\u03c9\u03bb\u03ae\u03c3\u03b5\u03b9\u03c2 \u03ba\u03b1\u03b9 \u03c4\u03bf \u03bc\u03ac\u03c1\u03ba\u03b5\u03c4\u03b9\u03bd\u03b3\u03ba \u03cc\u03c3\u03bf \u03b5\u03c1\u03b3\u03b1\u03b6\u03cc\u03c4\u03b1\u03bd \u03c3\u03b5 \u03b1\u03bd\u03b1\u03b3\u03bd\u03c9\u03c1\u03b9\u03c3\u03bc\u03ad\u03bd\u03b5\u03c2 \u03b5\u03c4\u03b1\u03b9\u03c1\u03b5\u03af\u03b5\u03c2 \u03c4\u03bf\u03c5 Fortune 1000.\n\u03a3\u03c5\u03bd\u03cc\u03c8\u03b9\u03c3\u03b5 \u03c4\u03bf \u03b9\u03c3\u03c4\u03bf\u03c1\u03b9\u03ba\u03cc \u03ba\u03b1\u03b9 \u03c4\u03b7\u03bd \u03c4\u03b5\u03c7\u03bd\u03bf\u03b3\u03bd\u03c9\u03c3\u03af\u03b1 \u03c4\u03b7\u03c2 Margaret Hines, \u03c4\u03b7\u03c2 \u03b9\u03b4\u03c1\u03cd\u03c4\u03c1\u03b9\u03b1\u03c2 \u03c4\u03bf\u03c5 Inspire Marketing." } } } ], "suggestions": [], "vectors": {} } ``` While the same record in HuggingFace `datasets` looks as follows: ```json { "external_id": "888", "metadata": "{\"source\": \"ultrachat\", \"kind\": \"synthetic\", \"evolved_from\": null}", "source": "Given the text: An experienced and enthusiastic innovator...you want on your team.\nMargaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee.\nMargaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses.\nPrior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies.\nSummarize the background and expertise of Margaret Hines, the founder of Inspire Marketing.", "target": [ { "status": "submitted", "user_id": "f4d8878d-e378-4087-a99b-c31dad5f0609", "value": "\u0392\u03ac\u03c3\u03b5\u03b9 \u03c4\u03bf\u03c5 \u03ba\u03b5\u03b9\u03bc\u03ad\u03bd\u03bf\u03c5: \u039c\u03af\u03b1 \u03ad\u03bc\u03c0\u03b5\u03b9\u03c1\u03b7 \u03ba\u03b1\u03b9 \u03b5\u03bd\u03b8\u03bf\u03c5\u03c3\u03b9\u03ce\u03b4\u03b7\u03c2 \u03ba\u03b1\u03b9\u03bd\u03bf\u03c4\u03cc\u03bc\u03bf\u03c2... \u03c0\u03bf\u03c5 \u03b8\u03ad\u03bb\u03b5\u03c4\u03b5 \u03c3\u03c4\u03b7\u03bd \u03bf\u03bc\u03ac\u03b4\u03b1 \u03c3\u03b1\u03c2.\n\u0397 Margaret Hines \u03b5\u03af\u03bd\u03b1\u03b9 \u03b7 \u03b9\u03b4\u03c1\u03cd\u03c4\u03c1\u03b9\u03b1 \u03ba\u03b1\u03b9 \u03b7 \u03ba\u03cd\u03c1\u03b9\u03b1 \u03c3\u03cd\u03bc\u03b2\u03bf\u03c5\u03bb\u03bf\u03c2 \u03c4\u03b7\u03c2 Inspire Marketing, LLC, \u03ad\u03c7\u03bf\u03bd\u03c4\u03b1\u03c2 \u03b5\u03c0\u03b5\u03bd\u03b4\u03cd\u03c3\u03b5\u03b9 \u03c3\u03b5 \u03c4\u03bf\u03c0\u03b9\u03ba\u03ad\u03c2 \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03ae\u03c3\u03b5\u03b9\u03c2, \u03b5\u03be\u03c5\u03c0\u03b7\u03c1\u03b5\u03c4\u03ce\u03bd\u03c4\u03b1\u03c2 \u03c4\u03b7\u03bd \u03ba\u03bf\u03b9\u03bd\u03cc\u03c4\u03b7\u03c4\u03b1 \u03bc\u03ad\u03c3\u03c9 \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03b7\u03bc\u03b1\u03c4\u03b9\u03ba\u03ae\u03c2 \u03bc\u03b5\u03c3\u03b9\u03c4\u03b5\u03af\u03b1\u03c2 \u03ba\u03b1\u03b9 \u03c3\u03c5\u03bc\u03b2\u03bf\u03c5\u03bb\u03ce\u03bd \u03bc\u03ac\u03c1\u03ba\u03b5\u03c4\u03b9\u03bd\u03b3\u03ba. \u0388\u03c7\u03b5\u03b9 \u03c0\u03c4\u03c5\u03c7\u03af\u03bf \u03b1\u03c0\u03cc \u03c4\u03bf \u03a0\u03b1\u03bd\u03b5\u03c0\u03b9\u03c3\u03c4\u03ae\u03bc\u03b9\u03bf \u03c4\u03b7\u03c2 \u039f\u03c5\u03ac\u03c3\u03b9\u03b3\u03ba\u03c4\u03bf\u03bd \u03c3\u03c4\u03bf St. Louis, MO, \u03ba\u03b1\u03b9 MBA \u03b1\u03c0\u03cc \u03c4\u03bf \u03a0\u03b1\u03bd\u03b5\u03c0\u03b9\u03c3\u03c4\u03ae\u03bc\u03b9\u03bf \u03c4\u03bf\u03c5 Wisconsin-Milwaukee.\n\u0397 Margaret \u03c0\u03c1\u03bf\u03c3\u03c6\u03ad\u03c1\u03b5\u03b9 \u03c3\u03c5\u03bc\u03b2\u03bf\u03c5\u03bb\u03ad\u03c2 \u03c3\u03b5 \u03b8\u03ad\u03bc\u03b1\u03c4\u03b1 \u03bc\u03ac\u03c1\u03ba\u03b5\u03c4\u03b9\u03bd\u03b3\u03ba, \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03b7\u03bc\u03b1\u03c4\u03b9\u03ba\u03ce\u03bd \u03c0\u03c9\u03bb\u03ae\u03c3\u03b5\u03c9\u03bd \u03ba\u03b1\u03b9 \u03b1\u03bd\u03b1\u03ba\u03b1\u03c4\u03b1\u03c3\u03ba\u03b5\u03c5\u03ce\u03bd \u03ba\u03b1\u03b9 franchising. \u0395\u03af\u03bd\u03b1\u03b9 \u03b5\u03c0\u03af\u03c3\u03b7\u03c2 \u03b5\u03c0\u03b5\u03bd\u03b4\u03cd\u03c4\u03c1\u03b9\u03b1 \u03c3\u03b5 \u03c4\u03bf\u03c0\u03b9\u03ba\u03ad\u03c2 \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03ae\u03c3\u03b5\u03b9\u03c2.\n\u03a0\u03c1\u03b9\u03bd \u03b1\u03c0\u03cc \u03c4\u03b7\u03bd \u03af\u03b4\u03c1\u03c5\u03c3\u03b7 \u03c4\u03b7\u03c2 Inspire Marketing \u03c4\u03bf 2003, \u03b7 Margaret \u03b1\u03c0\u03ad\u03ba\u03c4\u03b7\u03c3\u03b5 \u03c4\u03b7\u03bd \u03b5\u03c0\u03b9\u03c7\u03b5\u03b9\u03c1\u03b7\u03bc\u03b1\u03c4\u03b9\u03ba\u03ae \u03c4\u03b7\u03c2 \u03bf\u03be\u03c5\u03b4\u03ad\u03c1\u03ba\u03b5\u03b9\u03b1, \u03ba\u03b1\u03b9 \u03c4\u03b7\u03bd \u03c4\u03b5\u03c7\u03bd\u03bf\u03b3\u03bd\u03c9\u03c3\u03af\u03b1 \u03c4\u03b7\u03c2 \u03c3\u03c4\u03b9\u03c2 \u03c0\u03c9\u03bb\u03ae\u03c3\u03b5\u03b9\u03c2 \u03ba\u03b1\u03b9 \u03c4\u03bf \u03bc\u03ac\u03c1\u03ba\u03b5\u03c4\u03b9\u03bd\u03b3\u03ba \u03cc\u03c3\u03bf \u03b5\u03c1\u03b3\u03b1\u03b6\u03cc\u03c4\u03b1\u03bd \u03c3\u03b5 \u03b1\u03bd\u03b1\u03b3\u03bd\u03c9\u03c1\u03b9\u03c3\u03bc\u03ad\u03bd\u03b5\u03c2 \u03b5\u03c4\u03b1\u03b9\u03c1\u03b5\u03af\u03b5\u03c2 \u03c4\u03bf\u03c5 Fortune 1000.\n\u03a3\u03c5\u03bd\u03cc\u03c8\u03b9\u03c3\u03b5 \u03c4\u03bf \u03b9\u03c3\u03c4\u03bf\u03c1\u03b9\u03ba\u03cc \u03ba\u03b1\u03b9 \u03c4\u03b7\u03bd \u03c4\u03b5\u03c7\u03bd\u03bf\u03b3\u03bd\u03c9\u03c3\u03af\u03b1 \u03c4\u03b7\u03c2 Margaret Hines, \u03c4\u03b7\u03c2 \u03b9\u03b4\u03c1\u03cd\u03c4\u03c1\u03b9\u03b1\u03c2 \u03c4\u03bf\u03c5 Inspire Marketing." } ], "target-suggestion": null, "target-suggestion-metadata": { "agent": null, "score": null, "type": null } } ``` ### Data Fields Among the dataset fields, we differentiate between the following: * **Fields:** These are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions. * **source** is of type `text`. * **Questions:** These are the questions that will be asked to the annotators. They can be of different types, such as `RatingQuestion`, `TextQuestion`, `LabelQuestion`, `MultiLabelQuestion`, and `RankingQuestion`. * **target** is of type `text`, and description "Translate the text.". * **Suggestions:** As of Argilla 1.13.0, the suggestions have been included to provide the annotators with suggestions to ease or assist during the annotation process. Suggestions are linked to the existing questions, are always optional, and contain not just the suggestion itself, but also the metadata linked to it, if applicable. * (optional) **target-suggestion** is of type `text`. Additionally, we also have two more fields that are optional and are the following: * **metadata:** This is an optional field that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`. * **external_id:** This is an optional field that can be used to provide an external ID for the dataset record. This can be useful if you want to link the dataset record to an external resource, such as a database or a file. ### Data Splits The dataset contains a single split, which is `train`. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation guidelines This is a translation dataset that contains texts. Please translate the text in the text field. #### Annotation process The translators were native Greeks. Each prompt was initially translated via Google Translate, then refined by human annotators. Prompts containing information not relevant to the Greek context were not altered in any way before translation. Words with no direct equivalent in Greek were not translated. #### Who are the annotators? Initial annotation of the entire dataset was done by [Marios Mamalis](https://huggingface.co/Mario00000). ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]

# MPEP_GREEK 数据集卡片 本数据集依托[Argilla](https://docs.argilla.io)工具创建。 如下文所述,该数据集既可按照[使用Argilla加载](#load-with-argilla)章节的说明导入Argilla,也可直接通过HuggingFace `datasets`库按照[使用datasets加载](#load-with-datasets)章节的方式直接使用。 ## 数据集描述 - **主页**:https://argilla.io - **代码仓库**:https://github.com/argilla-io/argilla - **论文**:无 - **排行榜**:无 - **联系方式**:无 ### 数据集摘要 本数据集包含以下内容: 1. 符合Argilla数据集格式的配置文件`argilla.yaml`,当在Argilla中使用`FeedbackDataset.from_huggingface`方法时,将通过该配置文件完成数据集的初始化配置。 2. 兼容HuggingFace `datasets`格式的数据集样本。当使用`FeedbackDataset.from_huggingface`时,这些样本会自动加载;也可通过`datasets`库的`load_dataset`方法独立加载。 3. 数据集构建与标注时使用的[标注指南](#annotation-guidelines)(若已在Argilla中完成定义)。 ### 使用Argilla加载 若要通过Argilla加载该数据集,仅需执行`pip install argilla --upgrade`完成Argilla的安装或升级,随后运行如下代码: python import argilla as rg ds = rg.FeedbackDataset.from_huggingface("DIBT/MPEP_GREEK") ### 使用datasets库加载 若要通过`datasets`库加载该数据集,仅需执行`pip install datasets --upgrade`完成`datasets`库的安装或升级,随后运行如下代码: python from datasets import load_dataset ds = load_dataset("DIBT/MPEP_GREEK") ### 支持任务与排行榜 本数据集包含[多个字段、问题与回复](https://docs.argilla.io/en/latest/conceptual_guides/data_model.html#feedback-dataset),因此可根据配置适配多种自然语言处理(NLP)任务,数据集结构详见[数据集结构章节](#dataset-structure)。本数据集暂无关联排行榜。 ### 语言:[需补充更多信息] ## 数据集结构 ### Argilla中的数据 该数据集在Argilla中由以下部分构成:**字段(fields)**、**问题(questions)**、**建议(suggestions)**、**元数据(metadata)**、**向量(vectors)**及**标注指南(guidelines)**。 其中**字段**即为数据集的样本主体,目前仅支持文本字段类型,用于为后续的标注问题提供待处理内容。 | 字段名称 | 标题 | 类型 | 是否必填 | 是否支持Markdown | | ---------- | ----- | ---- | -------- | -------- | | source | Source | text | 是 | 是 | **问题**是向标注人员提出的标注任务指令,支持多种类型,包括评分、文本、标签选择、多标签选择及排序等。 | 问题名称 | 标题 | 类型 | 是否必填 | 描述 | 可选值/标签 | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | target | Target | text | 是 | 翻译该文本。 | 无 | **建议**是为辅助标注人员完成标注流程而提供的人工或自动生成的推荐结果,此类建议始终与对应标注问题绑定,命名规则为在问题名称后追加`-suggestion`和`-suggestion-metadata`后缀,分别存储建议内容及其元数据。因此建议的可选值类型与上述问题表一致,但列名需追加`-suggestion`后缀,元数据列名则追加`-suggestion-metadata`后缀。 **元数据**是用于存储数据集样本额外信息的字典类型字段,可用于向标注人员提供补充上下文,或记录样本自身的附加信息(例如样本原始来源链接、作者、发布日期等)。元数据为可选字段,可与`argilla.yaml`配置文件中定义的`metadata_properties`关联。 | 元数据名称 | 标题 | 类型 | 可选值 | 是否对标注人员可见 | | ------------- | ----- | ---- | ------ | ---------------------- | | (无) | | | | | **标注指南**同样为可选字段,为纯文本格式,用于向标注人员提供标注说明,详见[标注指南章节](#annotation-guidelines)。 ### 数据样本示例 Argilla格式下的数据集样本示例如下: json { "external_id": "888", "fields": { "source": "Given the text: An experienced and enthusiastic innovator...you want on your team. Margaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee. Margaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses. Prior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies. Summarize the background and expertise of Margaret Hines, the founder of Inspire Marketing." }, "metadata": { "evolved_from": null, "kind": "synthetic", "source": "ultrachat" }, "responses": [ { "status": "submitted", "user_id": "f4d8878d-e378-4087-a99b-c31dad5f0609", "values": { "target": { "value": "Βάσει του κειμένου: Μία έμπειρη και ενθουσιώδης καινοτόμος... που θέλετε στην ομάδα σας. Η Margaret Hines είναι η ιδρύτρια και η κύρια σύμβουλος της Inspire Marketing, LLC, έχοντας επενδύσει σε τοπικές επιχειρήσεις, εξυπηρετώντας την κοινότητα μέσω επιχειρηματικής μεσιτείας και συμβουλών μάρκετινγκ. Έχει πτυχίο από το Πανεπιστήμιο της Ουάσινγκτον στο St. Louis, MO, και MBA από το Πανεπιστήμιο του Wisconsin-Milwaukee. Η Margaret προσφέρει συμβουλές σε θέματα μάρκετινγκ, επιχειρηματικών πωλήσεων και ανακατασκευών και franchising. Είναι επίσης επενδύτρια σε τοπικές επιχειρήσεις. Πριν από την ίδρυση της Inspire Marketing το 2003, η Margaret απέκτησε την επιχειρηματική της οξυδέρκεια, και την τεχνογνωσία της στις πωλήσεις και το μάρκετινγκ όσο εργαζόταν σε αναγνωρισμένες εταιρείες του Fortune 1000. Σύνοψη το ιστορικό και την τεχνογνωσία της Margaret Hines, της ιδρύτριας του Inspire Marketing." } } } ], "suggestions": [], "vectors": {} } 而该样本在HuggingFace `datasets`库中的格式如下: json { "external_id": "888", "metadata": "{"source": "ultrachat", "kind": "synthetic", "evolved_from": null}", "source": "Given the text: An experienced and enthusiastic innovator...you want on your team. Margaret Hines is the founder and Principal Consultant of Inspire Marketing, LLC, investing in local businesses, serving the community with business brokerage and marketing consulting. She has an undergraduate degree from Washington University in St. Louis, MO, and an MBA from the University of Wisconsin-Milwaukee. Margaret offers consulting in marketing, business sales and turnarounds and franchising. She is also an investor in local businesses. Prior to founding Inspire Marketing in 2003, Margaret gained her business acumen, sales and marketing expertise while working at respected Fortune 1000 companies. Summarize the background and expertise of Margaret Hines, the founder of Inspire Marketing.", "target": [ { "status": "submitted", "user_id": "f4d8878d-e378-4087-a99b-c31dad5f0609", "value": "Βάσει του κειμένου: Μία έμπειρη και ενθουσιώδης καινοτόμος... που θέλετε στην ομάδα σας. Η Margaret Hines είναι η ιδρύτρια και η κύρια σύμβουλος της Inspire Marketing, LLC, έχοντας επενδύσει σε τοπικές επιχειρήσεις, εξυπηρετώντας την κοινότητα μέσω επιχειρηματικής μεσιτείας και συμβουλών μάρκετινγκ. Έχει πτυχίο από το Πανεπιστήμιο της Ουάσινγκτον στο St. Louis, MO, και MBA από το Πανεπιστήμιο του Wisconsin-Milwaukee. Η Margaret προσφέρει συμβουλές σε θέματα μάρκετινγκ, επιχειρηματικών πωλήσεων και ανακατασκευών και franchising. Είναι επίσης επενδύτρια σε τοπικές επιχειρήσεις. Πριν από την ίδρυση της Inspire Marketing το 2003, η Margaret απέκτησε την επιχειρηματική της οξυδέρκεια, και την τεχνογνωσία της στις πωλήσεις και το μάρκετινγκ όσο εργαζόταν σε αναγνωρισμένες εταιρείες του Fortune 1000. Σύνοψη το ιστορικό και την τεχνογνωσία της Margaret Hines, της ιδρύτριας του Inspire Marketing." } ], "target-suggestion": null, "target-suggestion-metadata": { "agent": null, "score": null, "type": null } } ### 数据字段 根据数据集字段的用途,可分为以下类别: * **字段(Fields)**:即为数据集的样本主体,目前仅支持文本字段类型,用于为后续的标注问题提供待处理内容。 * **source**:类型为`text`(文本)。 * **问题(Questions)**:向标注人员提出的标注任务指令,支持多种类型,包括`RatingQuestion`(评分问题)、`TextQuestion`(文本问题)、`LabelQuestion`(单标签选择问题)、`MultiLabelQuestion`(多标签选择问题)及`RankingQuestion`(排序问题)。 * **target**:类型为`text`(文本),描述为“翻译该文本”。 * **建议(Suggestions)**:自Argilla 1.13.0版本起引入,用于向标注人员提供辅助标注的推荐结果。建议始终与对应标注问题绑定,为可选字段,除建议内容外还可附带关联元数据(若有)。 * (可选)**target-suggestion**:类型为`text`(文本)。 此外,数据集还包含两个可选字段: * **元数据(metadata)**:可选字段,用于存储数据集样本的额外信息,可向标注人员提供补充上下文,或记录样本自身的附加信息(例如样本原始来源链接、作者、发布日期等)。元数据可与`argilla.yaml`配置文件中定义的`metadata_properties`关联。 * **external_id**:可选字段,用于为数据集样本分配外部唯一标识,便于将样本与外部资源(如数据库、文件等)进行关联。 ### 数据划分 本数据集仅包含一个划分,即`train`(训练集)。 ## 数据集构建 ### 标注 rationale:[需补充更多信息] ### 源数据 #### 初始数据收集与标准化:[需补充更多信息] #### 源语言创作者是谁?:[需补充更多信息] ### 标注 #### 标注指南 本数据集为翻译类数据集,包含待翻译文本。请将文本字段中的内容翻译为指定语言。 #### 标注流程 标注人员均为希腊语母语使用者。所有待翻译文本首先通过Google Translate完成初始翻译,随后由人工标注人员进行精修。 若提示词中包含与希腊语语境无关的信息,翻译过程中不对此类内容进行修改。 希腊语中无直接对应译法的词汇不进行翻译。 #### 标注人员是谁?: 全数据集的初始标注工作由[Marios Mamalis](https://huggingface.co/Mario00000)完成。 ### 个人与敏感信息:[需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响:[需补充更多信息] ### 偏差讨论:[需补充更多信息] ### 其他已知局限性:[需补充更多信息] ## 附加信息 ### 数据集 curators:[需补充更多信息] ### 授权信息:[需补充更多信息] ### 引用信息:[需补充更多信息] ### 贡献:[需补充更多信息]
提供机构:
maas
创建时间:
2025-07-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作