Magpie-Align/Llama-3-70B-SynDa-200K-Filtered-L
收藏Hugging Face2024-05-31 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Magpie-Align/Llama-3-70B-SynDa-200K-Filtered-L
下载链接
链接失效反馈官方服务:
资源简介:
---
license: llama3
dataset_info:
features:
- name: conversation_id
dtype: string
- name: model
dtype: string
- name: gen_input_config
struct:
- name: temperature
dtype: float64
- name: top_p
dtype: float64
- name: raw
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: conversations
list:
- name: from
dtype: string
- name: value
dtype: string
- name: intent
dtype: string
- name: knowledge
dtype: string
- name: difficulty
dtype: string
- name: input_rating_generator
dtype: string
- name: primary_tag
dtype: string
- name: other_tags
sequence: string
- name: tag_generator
dtype: string
- name: instruct_reward
dtype: float64
- name: reward_model
dtype: string
- name: llama_guard_2
dtype: string
- name: input_quality
dtype: string
- name: quality_explanation
dtype: string
- name: min_distance
dtype: float64
- name: repeat_count
dtype: int64
- name: min_similar_conversation_id
dtype: string
- name: original_index
dtype: int64
- name: output_length
dtype: int64
splits:
- name: train
num_bytes: 1611281043
num_examples: 200000
download_size: 844046115
dataset_size: 1611281043
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
Magpie-Align
原始信息汇总
数据集概述
数据集特征
- conversation_id: 字符串类型
- model: 字符串类型
- gen_input_config: 结构体类型,包含以下字段:
- temperature: 浮点数类型(float64)
- top_p: 浮点数类型(float64)
- raw: 字符串类型
- input: 字符串类型
- output: 字符串类型
- conversations: 列表类型,包含以下字段:
- from: 字符串类型
- value: 字符串类型
- intent: 字符串类型
- knowledge: 字符串类型
- difficulty: 字符串类型
- input_rating_generator: 字符串类型
- primary_tag: 字符串类型
- other_tags: 序列类型,字符串
- tag_generator: 字符串类型
- instruct_reward: 浮点数类型(float64)
- reward_model: 字符串类型
- llama_guard_2: 字符串类型
- input_quality: 字符串类型
- quality_explanation: 字符串类型
- min_distance: 浮点数类型(float64)
- repeat_count: 整数类型(int64)
- min_similar_conversation_id: 字符串类型
- original_index: 整数类型(int64)
- output_length: 整数类型(int64)
数据集分割
- train:
- 数据量: 1611281043 字节
- 示例数量: 200000
数据集大小
- 下载大小: 844046115 字节
- 数据集大小: 1611281043 字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*



