kunishou/hh-rlhf-49k-ja
收藏Hugging Face2023-11-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kunishou/hh-rlhf-49k-ja
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
This dataset was created by automatically translating part of "Anthropic/hh-rlhf" into Japanese.
This dataset is also included in "mosaicml/dolly_hhrlhf".
The "ng_translation" flag indicates that the translation was not successful, and "1" means that the translation failed.
Therefore, for data with "1", "instruction" and "instruction_en" contain the same text.
以下の通りに読み込むことで"ng_translation"が"1"(翻訳誤り)のものを除外して使用できます。
```
pip install datasets
```
```
from datasets import Dataset, load_dataset
dataset = load_dataset("kunishou/hh-rlhf-49k-ja")
dataset.set_format(type="pandas")
df = dataset["train"][:]
df = df[df["ng_translation"]!="1"].drop(["ng_translation", "index"], axis=1).reset_index()
dataset = Dataset.from_pandas(df)
dataset
```
hh-rlhf repository
https://github.com/anthropics/hh-rlhf
Anthropic/hh-rlhf
https://huggingface.co/datasets/Anthropic/hh-rlhf
mosaicml/dolly_hhrlhf
https://huggingface.co/datasets/mosaicml/dolly_hhrlhf
提供机构:
kunishou
原始信息汇总
数据集概述
数据集来源与组成
- 该数据集是通过自动将部分"Anthropic/hh-rlhf"内容翻译成日语创建的。
- 此数据集也包含在"mosaicml/dolly_hhrlhf"中。
数据集特征
- 数据集中的"ng_translation"标志指示翻译是否成功,"1"表示翻译失败。
- 对于标记为"1"的数据,"instruction"和"instruction_en"包含相同的文本。
数据集使用方法
-
通过以下代码可以加载并处理数据集,排除翻译失败的数据: python pip install datasets
python from datasets import Dataset, load_dataset
dataset = load_dataset("kunishou/hh-rlhf-49k-ja") dataset.set_format(type="pandas") df = dataset["train"][:] df = df[df["ng_translation"]!="1"].drop(["ng_translation", "index"], axis=1).reset_index() dataset = Dataset.from_pandas(df) dataset



