Finnish-NLP/antropic_hhrlhf_filtered_deepl_translated

Name: Finnish-NLP/antropic_hhrlhf_filtered_deepl_translated
Creator: Finnish-NLP
Published: 2024-12-11 22:12:05
License: 暂无描述

Hugging Face2024-12-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Finnish-NLP/antropic_hhrlhf_filtered_deepl_translated

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: instruction dtype: string - name: response_accepted dtype: string - name: response_rejected dtype: string - name: instruction_orig dtype: string - name: response_accepted_orig dtype: string - name: response_rejected_orig dtype: string - name: instruction_lang dtype: string - name: instruction_lang_proba dtype: float64 - name: chosen_response_lang dtype: string - name: chosen_response_lang_proba dtype: float64 - name: rejected_response_lang dtype: string - name: rejected_response_langb_proba dtype: float64 - name: instruction_perplexity_kenlm dtype: int64 - name: chosen_response_perplexity_kenlm dtype: int64 - name: rejected_response_perplexity_kenlm dtype: int64 - name: combined_perplexity dtype: int64 - name: instruction_and_accepted_perplexity dtype: int64 - name: response_orig dtype: string - name: response dtype: string - name: dataset_source dtype: string - name: dataset_type dtype: string - name: task_class dtype: string - name: orig_lang dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string - name: nemotron_judging_text dtype: string - name: is_multiturn dtype: bool - name: accepted_score_pairRM dtype: float64 - name: rejected_score_pairRM dtype: float64 - name: Ystävällisyys dtype: int64 - name: Faktuaalisuus dtype: int64 - name: Selkeys dtype: int64 - name: Kielellinen laatu dtype: int64 - name: Johdonmukaisuus dtype: int64 - name: Erikoisluokittelu dtype: int64 - name: Kokonaisarvosana dtype: float64 - name: prompt_for_judging dtype: string - name: instruction_lang_score dtype: float64 - name: response_lang dtype: string - name: response_lang_score dtype: float64 - name: text dtype: string - name: fin_chunk_ratio_full_text dtype: float64 - name: fin_10_word_chunk_amt dtype: int64 - name: non_fi_10_word_chunk_amt dtype: int64 - name: text_fin_chunk_ratio_bin dtype: int64 - name: response_perplexity_kenlm dtype: int64 splits: - name: train num_bytes: 58130618 num_examples: 5497 download_size: 24977567 dataset_size: 58130618 configs: - config_name: default data_files: - split: train path: data/train-* --- README TO DO BUT RELEASED NEVERTHELESS

数据集信息：特征列表： - 字段名：instruction，数据类型：字符串类型 - 字段名：response_accepted，数据类型：字符串类型 - 字段名：response_rejected，数据类型：字符串类型 - 字段名：instruction_orig，数据类型：字符串类型 - 字段名：response_accepted_orig，数据类型：字符串类型 - 字段名：response_rejected_orig，数据类型：字符串类型 - 字段名：instruction_lang，数据类型：字符串类型 - 字段名：instruction_lang_proba，数据类型：64位浮点型 - 字段名：chosen_response_lang，数据类型：字符串类型 - 字段名：chosen_response_lang_proba，数据类型：64位浮点型 - 字段名：rejected_response_lang，数据类型：字符串类型 - 字段名：rejected_response_langb_proba，数据类型：64位浮点型 - 字段名：instruction_perplexity_kenlm：指令的KENLM困惑度（perplexity），数据类型：64位整型 - 字段名：chosen_response_perplexity_kenlm：选中回复的KENLM困惑度（perplexity），数据类型：64位整型 - 字段名：rejected_response_perplexity_kenlm：被拒绝回复的KENLM困惑度（perplexity），数据类型：64位整型 - 字段名：combined_perplexity：组合困惑度（perplexity），数据类型：64位整型 - 字段名：instruction_and_accepted_perplexity：指令与被采纳回复的组合困惑度（perplexity），数据类型：64位整型 - 字段名：response_orig，数据类型：字符串类型 - 字段名：response，数据类型：字符串类型 - 字段名：dataset_source，数据类型：字符串类型 - 字段名：dataset_type，数据类型：字符串类型 - 字段名：task_class，数据类型：字符串类型 - 字段名：orig_lang，数据类型：字符串类型 - 字段名：messages，数据类型：列表，列表项结构： - 字段名：content，数据类型：字符串类型 - 字段名：role，数据类型：字符串类型 - 字段名：nemotron_judging_text：Nemotron评分文本，数据类型：字符串类型 - 字段名：is_multiturn：是否多轮对话，数据类型：布尔类型 - 字段名：accepted_score_pairRM：被采纳回复的成对奖励模型评分（PairRM），数据类型：64位浮点型 - 字段名：rejected_score_pairRM：被拒绝回复的成对奖励模型评分（PairRM），数据类型：64位浮点型 - 字段名：Ystävällisyys：友好度（Ystävällisyys），数据类型：64位整型 - 字段名：Faktuaalisuus：事实性（Faktuaalisuus），数据类型：64位整型 - 字段名：Selkeys：清晰度（Selkeys），数据类型：64位整型 - 字段名：Kielellinen laatu：语言质量（Kielellinen laatu），数据类型：64位整型 - 字段名：Johdonmukaisuus：连贯性（Johdonmukaisuus），数据类型：64位整型 - 字段名：Erikoisluokittelu：专项评分（Erikoisluokittelu），数据类型：64位整型 - 字段名：Kokonaisarvosana：总分（Kokonaisarvosana），数据类型：64位浮点型 - 字段名：prompt_for_judging：评分提示词，数据类型：字符串类型 - 字段名：instruction_lang_score：指令语言评分，数据类型：64位浮点型 - 字段名：response_lang：回复语言，数据类型：字符串类型 - 字段名：response_lang_score：回复语言评分，数据类型：64位浮点型 - 字段名：text：文本，数据类型：字符串类型 - 字段名：fin_chunk_ratio_full_text：芬兰语文本块占比，数据类型：64位浮点型 - 字段名：fin_10_word_chunk_amt：芬兰语10词块数量，数据类型：64位整型 - 字段名：non_fi_10_word_chunk_amt：非芬兰语10词块数量，数据类型：64位整型 - 字段名：text_fin_chunk_ratio_bin：文本芬兰语块占比二进制标识，数据类型：64位整型 - 字段名：response_perplexity_kenlm：回复的KENLM困惑度（perplexity），数据类型：64位整型划分集： - 划分名称：train（训练集），字节数：58130618，样本数量：5497 下载大小：24977567，数据集总大小：58130618 配置项： - 配置名称：default（默认配置），数据文件： - 划分：train，路径：data/train-* README待完善，但已正式发布

提供机构：

Finnish-NLP

原始信息汇总

数据集概述

数据集特征

instruction: 字符串类型
response_accepted: 字符串类型
response_rejected: 字符串类型
instruction_orig: 字符串类型
response_accepted_orig: 字符串类型
response_rejected_orig: 字符串类型
instruction_lang: 字符串类型
instruction_lang_proba: 浮点数类型
chosen_response_lang: 字符串类型
chosen_response_lang_proba: 浮点数类型
rejected_response_lang: 字符串类型
rejected_response_langb_proba: 浮点数类型
instruction_perplexity_kenlm: 整数类型
chosen_response_perplexity_kenlm: 整数类型
rejected_response_perplexity_kenlm: 整数类型
combined_perplexity: 整数类型
instruction_and_accepted_perplexity: 整数类型

数据集分割

train: 包含5497个样本，占用13192016字节

数据集大小

下载大小: 8254060字节
数据集大小: 13192016字节

配置

default: 包含训练数据文件，路径为data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集