cornfieldrm/datamix-v6.0
收藏Hugging Face2024-04-24 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/cornfieldrm/datamix-v6.0
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: dataset
dtype: string
- name: prompt_source
dtype: string
- name: response_model
dtype: string
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: helpsteer-helpfulness
dtype: float64
- name: helpsteer-correctness
dtype: float64
- name: helpsteer-coherence
dtype: float64
- name: helpsteer-complexity
dtype: float64
- name: helpsteer-verbosity
dtype: float64
- name: ultrafeedback-overall_score
dtype: float64
- name: ultrafeedback-instruction_following
dtype: float64
- name: ultrafeedback-truthfulness
dtype: float64
- name: ultrafeedback-honesty
dtype: float64
- name: ultrafeedback-helpfulness
dtype: float64
- name: argilla-overall_quality
dtype: float64
- name: code-complexity
dtype: float64
- name: code-style
dtype: float64
- name: code-explanation
dtype: float64
- name: code-instruction-following
dtype: float64
- name: code-readability
dtype: float64
- name: llama_guard2-is_safe
dtype: float64
splits:
- name: train
num_bytes: 863839310
num_examples: 366190
download_size: 277791978
dataset_size: 863839310
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
数据集信息:
特征字段:
- 字段名:数据集(dataset),数据类型:字符串
- 字段名:提示词来源(prompt_source),数据类型:字符串
- 字段名:响应生成模型(response_model),数据类型:字符串
- 字段名:对话消息列表(messages),列表类型,列表项包含:
- 字段名:内容(content),数据类型:字符串
- 字段名:角色(role),数据类型:字符串
- 字段名:HelpSteer-有用性(helpsteer-helpfulness),数据类型:64位浮点数(float64)
- 字段名:HelpSteer-正确性(helpsteer-correctness),数据类型:64位浮点数(float64)
- 字段名:HelpSteer-连贯性(helpsteer-coherence),数据类型:64位浮点数(float64)
- 字段名:HelpSteer-复杂度(helpsteer-complexity),数据类型:64位浮点数(float64)
- 字段名:HelpSteer-冗长度(helpsteer-verbosity),数据类型:64位浮点数(float64)
- 字段名:UltraFeedback-整体评分(ultrafeedback-overall_score),数据类型:64位浮点数(float64)
- 字段名:UltraFeedback-指令遵循度(ultrafeedback-instruction_following),数据类型:64位浮点数(float64)
- 字段名:UltraFeedback-真实性(ultrafeedback-truthfulness),数据类型:64位浮点数(float64)
- 字段名:UltraFeedback-诚实性(ultrafeedback-honesty),数据类型:64位浮点数(float64)
- 字段名:UltraFeedback-有用性(ultrafeedback-helpfulness),数据类型:64位浮点数(float64)
- 字段名:Argilla-整体质量(argilla-overall_quality),数据类型:64位浮点数(float64)
- 字段名:代码复杂度(code-complexity),数据类型:64位浮点数(float64)
- 字段名:代码风格(code-style),数据类型:64位浮点数(float64)
- 字段名:代码解释性(code-explanation),数据类型:64位浮点数(float64)
- 字段名:代码指令遵循度(code-instruction-following),数据类型:64位浮点数(float64)
- 字段名:代码可读性(code-readability),数据类型:64位浮点数(float64)
- 字段名:LlamaGuard 2-安全性检测结果(llama_guard2-is_safe),数据类型:64位浮点数(float64)
数据集划分:
- 划分名称:训练集(train),字节数:863839310,样本数量:366190
下载大小:277791978
数据集存储大小:863839310
数据集配置:
- 配置名称:默认(default),数据文件:
- 划分:训练集(train),路径:data/train-*
提供机构:
cornfieldrm



