aya_english_dpo_raw
收藏魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/data-is-better-together/aya_english_dpo_raw
下载链接
链接失效反馈官方服务:
资源简介:
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
</a>
</p>
# Dataset Card for aya_english_dpo_raw
This dataset has been created with [distilabel](https://distilabel.argilla.io/).
## Dataset Summary
This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI:
```console
distilabel pipeline run --config "https://huggingface.co/datasets/DIBT/aya_english_dpo_raw/raw/main/pipeline.yaml"
```
or explore the configuration:
```console
distilabel pipeline info --config "https://huggingface.co/datasets/DIBT/aya_english_dpo_raw/raw/main/pipeline.yaml"
```
## Dataset structure
The examples have the following structure per configuration:
<details><summary> Configuration: default </summary><hr>
```json
{
"annotation_type": "original-annotations",
"generation": "The largest organ in the human body is the skin. It covers the entire surface of the body and has a total area of about 22 square feet (2 square meters) in adults. The skin performs many important functions, such as:\n\n* Protecting the body from external damage, such as injury, infection, and UV radiation\n* Regulating body temperature\n* Aiding in the production of vitamin D\n* Assisting in the sensation of touch and feel\n* Helping to maintain fluid balance and hydration\n\nThe skin is composed of several layers, with the outermost layer being the epidermis, followed by the dermis, and then the hypodermis. It is estimated that the skin accounts for about 15% of an adult\u0027s total body weight.\n\nIt\u0027s worth noting that while the skin is the largest organ by surface area, the liver is often considered the largest organ by mass, weighing around 3 pounds (1.4 kilograms) in adults.",
"generation_model": "meta-llama/Meta-Llama-3-70B-Instruct",
"generation_models": [
"aya",
"meta-llama/Meta-Llama-3-70B-Instruct"
],
"generations": [
"The largest organ in the human body is the skin. It is the body\u0027s largest organ by surface area and weight. The skin serves many essential functions, including protecting the body from physical damage, pathogens, and dehydration, as well as helping to regulate body temperature and producing vitamin D in response to sunlight. It\u0027s a complex organ with multiple layers, cells, and structures, each serving a unique purpose in maintaining the body\u0027s overall health and well-being.",
"The largest organ in the human body is the skin. It covers the entire surface of the body and has a total area of about 22 square feet (2 square meters) in adults. The skin performs many important functions, such as:\n\n* Protecting the body from external damage, such as injury, infection, and UV radiation\n* Regulating body temperature\n* Aiding in the production of vitamin D\n* Assisting in the sensation of touch and feel\n* Helping to maintain fluid balance and hydration\n\nThe skin is composed of several layers, with the outermost layer being the epidermis, followed by the dermis, and then the hypodermis. It is estimated that the skin accounts for about 15% of an adult\u0027s total body weight.\n\nIt\u0027s worth noting that while the skin is the largest organ by surface area, the liver is often considered the largest organ by mass, weighing around 3 pounds (1.4 kilograms) in adults."
],
"instruction": "What is the largest organ in the human body?",
"language": "English",
"language_code": "eng",
"model_name": "meta-llama/Meta-Llama-3-70B-Instruct",
"predicted_generation_language": "eng_Latn",
"predicted_generation_language_score": 0.9590950012207031,
"ratings": [
5,
5
],
"rationales": [
"The text is accurate, informative, and comprehensive in describing the skin as the largest organ in the human body. It provides specific details about its functions, structure, and importance, aligning perfectly with the instruction.",
"This text is equally excellent, providing a clear and concise answer to the question. It lists the skin\u0027s functions, describes its composition, and offers additional interesting facts, such as the comparison with the liver\u0027s mass. The text is well-structured, accurate, and confident in its information, making it an excellent response."
],
"targets": "The largest organ in the human body is the skin. It is the body\u0027s largest organ by surface area and weight. The skin serves many essential functions, including protecting the body from physical damage, pathogens, and dehydration, as well as helping to regulate body temperature and producing vitamin D in response to sunlight. It\u0027s a complex organ with multiple layers, cells, and structures, each serving a unique purpose in maintaining the body\u0027s overall health and well-being.",
"user_id": "29f22cf193a81e1a5c47d76af453a91b3cd19aa348995c7add1df15fe24e8801"
}
```
This subset can be loaded as:
```python
from datasets import load_dataset
ds = load_dataset("DIBT/aya_english_dpo_raw", "default")
```
Or simply as it follows, since there's only one configuration and is named `default`:
```python
from datasets import load_dataset
ds = load_dataset("DIBT/aya_english_dpo_raw")
```
</details>
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="基于Distilabel构建" width="200" height="32"/>
</a>
</p>
# 数据集卡片:aya_english_dpo_raw
本数据集由[Distilabel(distilabel)](https://distilabel.argilla.io/)构建。
## 数据集概览
本数据集包含一个`pipeline.yaml`文件,可通过`distilabel`命令行界面(CLI)复现生成该数据集的流水线:
console
distilabel pipeline run --config "https://huggingface.co/datasets/DIBT/aya_english_dpo_raw/raw/main/pipeline.yaml"
也可查看该配置的详细信息:
console
distilabel pipeline info --config "https://huggingface.co/datasets/DIBT/aya_english_dpo_raw/raw/main/pipeline.yaml"
## 数据集结构
各配置下的示例格式如下:
<details><summary> 配置:default </summary><hr>
json
{
"annotation_type": "原始标注",
"generation": "人体最大的器官是皮肤。它覆盖身体的整个表面,成年人的皮肤总面积约为22平方英尺(2平方米)。皮肤具有多项重要功能,例如:
* 保护身体免受外部损伤,如外伤、感染和紫外线辐射
* 调节体温
* 辅助维生素D的合成
* 协助感知触觉
* 帮助维持体液平衡与水合状态
皮肤由多个层次构成,最外层为表皮,随后是真皮,再下层为皮下组织。据估算,皮肤约占成人体重的15%。
需要注意的是,虽然皮肤按表面积计算是最大的器官,但肝脏通常被认为是按质量计算的最大器官,成年人的肝脏重量约为3磅(1.4千克)。",
"generation_model": "meta-llama/Meta-Llama-3-70B-Instruct",
"generation_models": [
"aya",
"meta-llama/Meta-Llama-3-70B-Instruct"
],
"generations": [
"人体最大的器官是皮肤。它是人体按表面积和质量计算的最大器官。皮肤具有多项关键功能,包括保护身体免受物理损伤、病原体侵袭和脱水,以及帮助调节体温并在日照下合成维生素D。它是一个复杂的器官,拥有多个层次、细胞和结构,各自在维持身体整体健康与福祉方面发挥着独特作用。",
"人体最大的器官是皮肤。它覆盖身体的整个表面,成年人的皮肤总面积约为22平方英尺(2平方米)。皮肤具有多项重要功能,例如:
* 保护身体免受外部损伤,如外伤、感染和紫外线辐射
* 调节体温
* 辅助维生素D的合成
* 协助感知触觉
* 帮助维持体液平衡与水合状态
皮肤由多个层次构成,最外层为表皮,随后是真皮,再下层为皮下组织。据估算,皮肤约占成人体重的15%。
需要注意的是,虽然皮肤按表面积计算是最大的器官,但肝脏通常被认为是按质量计算的最大器官,成年人的肝脏重量约为3磅(1.4千克)。"
],
"instruction": "人体最大的器官是什么?",
"language": "英语",
"language_code": "eng",
"model_name": "meta-llama/Meta-Llama-3-70B-Instruct",
"predicted_generation_language": "eng_Latn",
"predicted_generation_language_score": 0.9590950012207031,
"ratings": [
5,
5
],
"rationales": [
"该文本准确、详实且全面地将皮肤描述为人体最大的器官,提供了其功能、结构与重要性的具体细节,与指令完美契合。",
"该文本同样优秀,为该问题提供了清晰简洁的答案。它列出了皮肤的功能,描述了其组成结构,并补充了诸如与肝脏质量对比的有趣事实。文本结构清晰、信息准确且表述自信,是一份出色的回复。"
],
"targets": "人体最大的器官是皮肤。它是人体按表面积和质量计算的最大器官。皮肤具有多项关键功能,包括保护身体免受物理损伤、病原体侵袭和脱水,以及帮助调节体温并在日照下合成维生素D。它是一个复杂的器官,拥有多个层次、细胞和结构,各自在维持身体整体健康与福祉方面发挥着独特作用。",
"user_id": "29f22cf193a81e1a5c47d76af453a91b3cd19aa348995c7add1df15fe24e8801"
}
该子集可通过以下代码加载:
python
from datasets import load_dataset
ds = load_dataset("DIBT/aya_english_dpo_raw", "default")
由于该数据集仅存在一个名为`default`的配置,也可简化为:
python
from datasets import load_dataset
ds = load_dataset("DIBT/aya_english_dpo_raw")
</details>
提供机构:
maas
创建时间:
2025-07-10



