lihaoxin2020/ki-Llama-3.1-8B-Instruct-temp0.0-on-mmlu_pro-0shot_cot-scillm-9258618651
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lihaoxin2020/ki-Llama-3.1-8B-Instruct-temp0.0-on-mmlu_pro-0shot_cot-scillm-9258618651
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: biology
features:
- name: data
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_response
struct:
- name: text
dtype: string
- name: raw_text
dtype: string
- name: knowledge_pieces
list: string
- name: doc_id
dtype: int64
- name: native_id
dtype: int64
splits:
- name: train
num_bytes: 2648643
num_examples: 200
download_size: 2570902
dataset_size: 2648643
- config_name: chemistry
features:
- name: data
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_response
struct:
- name: text
dtype: string
- name: raw_text
dtype: string
- name: knowledge_pieces
list: string
- name: doc_id
dtype: int64
- name: native_id
dtype: int64
splits:
- name: train
num_bytes: 12155032
num_examples: 200
download_size: 12065191
dataset_size: 12155032
- config_name: computer science
features:
- name: data
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_response
struct:
- name: text
dtype: string
- name: raw_text
dtype: string
- name: knowledge_pieces
list: string
- name: doc_id
dtype: int64
- name: native_id
dtype: int64
splits:
- name: train
num_bytes: 5229658
num_examples: 200
download_size: 5161670
dataset_size: 5229658
- config_name: engineering
features:
- name: data
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_response
struct:
- name: text
dtype: string
- name: raw_text
dtype: string
- name: knowledge_pieces
list: string
- name: doc_id
dtype: int64
- name: native_id
dtype: int64
splits:
- name: train
num_bytes: 16477900
num_examples: 200
download_size: 16373309
dataset_size: 16477900
- config_name: health
features:
- name: data
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_response
struct:
- name: text
dtype: string
- name: raw_text
dtype: string
- name: knowledge_pieces
list: string
- name: doc_id
dtype: int64
- name: native_id
dtype: int64
splits:
- name: train
num_bytes: 2032617
num_examples: 200
download_size: 1967962
dataset_size: 2032617
- config_name: math
features:
- name: data
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_response
struct:
- name: text
dtype: string
- name: raw_text
dtype: string
- name: knowledge_pieces
list: string
- name: doc_id
dtype: int64
- name: native_id
dtype: int64
splits:
- name: train
num_bytes: 8740597
num_examples: 200
download_size: 8672137
dataset_size: 8740597
- config_name: physics
features:
- name: data
list:
- name: content
dtype: string
- name: role
dtype: string
- name: raw_response
struct:
- name: text
dtype: string
- name: raw_text
dtype: string
- name: knowledge_pieces
list: string
- name: doc_id
dtype: int64
- name: native_id
dtype: int64
splits:
- name: train
num_bytes: 12692108
num_examples: 200
download_size: 12614879
dataset_size: 12692108
configs:
- config_name: biology
data_files:
- split: train
path: biology/train-*
- config_name: chemistry
data_files:
- split: train
path: chemistry/train-*
- config_name: computer science
data_files:
- split: train
path: computer science/train-*
- config_name: engineering
data_files:
- split: train
path: engineering/train-*
- config_name: health
data_files:
- split: train
path: health/train-*
- config_name: math
data_files:
- split: train
path: math/train-*
- config_name: physics
data_files:
- split: train
path: physics/train-*
---
数据集信息如下:
1. 生物学(biology)配置项
特征字段包括:
- `data`:列表类型,包含两个子字段:
- `content`:内容,数据类型为字符串
- `role`:角色,数据类型为字符串
- `raw_response`:结构体类型,其子字段`text`为文本,数据类型为字符串
- `raw_text`:原始文本,数据类型为字符串
- `knowledge_pieces`:字符串列表类型
- `doc_id`:文档ID,数据类型为64位整型
- `native_id`:原生ID,数据类型为64位整型
数据拆分:仅包含训练集(train),字节数为2648643,样本数量为200;下载大小为2570902,数据集总大小为2648643
2. 化学(chemistry)配置项
特征字段与生物学配置项一致,数据拆分仅包含训练集,字节数为12155032,样本数量为200;下载大小为12065191,数据集总大小为12155032
3. 计算机科学(computer science)配置项
特征字段与生物学配置项一致,数据拆分仅包含训练集,字节数为5229658,样本数量为200;下载大小为5161670,数据集总大小为5229658
4. 工程学(engineering)配置项
特征字段与生物学配置项一致,数据拆分仅包含训练集,字节数为16477900,样本数量为200;下载大小为16373309,数据集总大小为16477900
5. 健康学(health)配置项
特征字段与生物学配置项一致,数据拆分仅包含训练集,字节数为2032617,样本数量为200;下载大小为1967962,数据集总大小为2032617
6. 数学(math)配置项
特征字段与生物学配置项一致,数据拆分仅包含训练集,字节数为8740597,样本数量为200;下载大小为8672137,数据集总大小为8740597
7. 物理学(physics)配置项
特征字段与生物学配置项一致,数据拆分仅包含训练集,字节数为12692108,样本数量为200;下载大小为12614879,数据集总大小为12692108
配置项数据文件详情:
- 生物学配置:仅包含训练集拆分的数据文件,路径为`biology/train-*`
- 化学配置:仅包含训练集拆分的数据文件,路径为`chemistry/train-*`
- 计算机科学配置:仅包含训练集拆分的数据文件,路径为`computer science/train-*`
- 工程学配置:仅包含训练集拆分的数据文件,路径为`engineering/train-*`
- 健康学配置:仅包含训练集拆分的数据文件,路径为`health/train-*`
- 数学配置:仅包含训练集拆分的数据文件,路径为`math/train-*`
- 物理学配置:仅包含训练集拆分的数据文件,路径为`physics/train-*`
提供机构:
lihaoxin2020



