KazMMLU
收藏魔搭社区2025-10-09 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/MBZUAI/KazMMLU
下载链接
链接失效反馈官方服务:
资源简介:
# MukhammedTogmanov/jana
This dataset contains multiple subsets for different subjects and domains, structured for few-shot learning and evaluation.
## Dataset Structure
The dataset is organized into subsets, each corresponding to a specific subject and level. Each subset has two splits:
- **`dev`**: Few-shot examples (3 rows per subset).
- **`test`**: The remaining rows for evaluation.
### Subsets
The dataset contains the following subsets:
- Accounting (University)
- Biology (High School in kaz)
- Biology (High School in rus)
- Chemistry (High School in kaz)
- Chemistry (High School in rus)
- ... (list other subsets)
### Splits
| Split | Number of Rows |
|-------|----------------|
| dev | 111 |
| test | 22,889 |
## Features
The dataset includes the following columns:
- **`Number`**: Question ID.
- **`Source`**: URL or source of the question.
- **`Group`**: The group to which the subject belongs (e.g., Social Science, Science).
- **`Subject`**: The subject of the question (e.g., Biology, Accounting).
- **`Level`**: Academic level (e.g., High School, University).
- **`Question`**: The text of the question.
- **`Option A`**, **`Option B`**, **`Option C`**, **`Option D`**, **`Option E`**: The answer choices.
- **`Answer Key`**: The correct answer.
- **`Subset`**: The name of the subset (e.g., Biology (High School in kaz)).
- **`Split`**: The data split (`dev` or `test`).
- **`is_few_shot`**: Indicates whether the row is part of the few-shot examples (`1` for `dev`, `0` for `test`).
## Usage
You can load this dataset with the following code:
```python
from datasets import load_dataset
dataset = load_dataset("MukhammedTogmanov/jana")
# MukhammedTogmanov/jana
本数据集包含面向不同学科与领域的多个子集,专为少样本学习(Few-shot Learning)与评估任务构建。
## 数据集结构
数据集以子集为单位进行组织,每个子集对应特定的学科与学业层次。每个子集均包含两个数据划分:
- **`dev`**:少样本示例集(每个子集含3条数据)。
- **`test`**:用于评估的剩余数据。
### 子集列表
本数据集包含如下子集:
- 会计学(大学层次)
- 生物学(哈萨克语高中)
- 生物学(俄语高中)
- 化学(哈萨克语高中)
- 化学(俄语高中)
- ……(其余子集列表)
### 数据划分
| 划分 | 数据条数 |
|-------|----------------|
| dev | 111 |
| test | 22,889 |
## 字段说明
本数据集包含如下字段:
- **`Number`**:问题编号。
- **`Source`**:问题的来源URL或出处。
- **`Group`**:问题所属的学科大类(例如社会科学、自然科学)。
- **`Subject`**:问题所属的具体学科(例如生物学、会计学)。
- **`Level`**:学业层次(例如高中、大学)。
- **`Question`**:问题文本。
- **`Option A`**、**`Option B`**、**`Option C`**、**`Option D`**、**`Option E`**:备选答案选项。
- **`Answer Key`**:正确答案。
- **`Subset`**:所属子集名称(例如“生物学(哈萨克语高中)”)。
- **`Split`**:数据划分类型(`dev` 或 `test`)。
- **`is_few_shot`**:标记该条数据是否属于少样本示例集(`dev` 划分对应值为1,`test` 划分对应值为0)。
## 使用方法
可通过如下代码加载本数据集:
python
from datasets import load_dataset
dataset = load_dataset("MukhammedTogmanov/jana")
提供机构:
maas
创建时间:
2025-03-17



