hon9kon9ize/yue_mmlu
收藏Hugging Face2024-04-22 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/hon9kon9ize/yue_mmlu
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input
dtype: string
- name: target
dtype: string
- name: A
dtype: string
- name: B
dtype: string
- name: C
dtype: string
- name: D
dtype: string
- name: translated_input
dtype: string
- name: translated_target
dtype: string
- name: translated_A
dtype: string
- name: translated_B
dtype: string
- name: translated_C
dtype: string
- name: translated_D
dtype: string
- name: split
dtype: string
- name: category
dtype: string
- name: line
dtype: int64
splits:
- name: train
num_bytes: 30895860
num_examples: 17388
download_size: 13078424
dataset_size: 30895860
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Cantonese Translated MMLU dataset
This dataset is a Cantonese translation of [Measuring Massive Multitask Language Understanding](https://github.com/hendrycks/test). For more detailed information about the original dataset, please refer to the provided link.
This dataset is translated by Gemini Pro and has not undergone any manual verification. The content may be inaccurate or misleading. please keep this in mind when using this dataset.
## Citation
If you find this useful in your research, please consider citing the test and also the [ETHICS](https://arxiv.org/abs/2008.02275) dataset it draws from:
@article{hendryckstest2021,
title={Measuring Massive Multitask Language Understanding},
author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt},
journal={Proceedings of the International Conference on Learning Representations (ICLR)},
year={2021}
}
@article{hendrycks2021ethics,
title={Aligning AI With Shared Human Values},
author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},
journal={Proceedings of the International Conference on Learning Representations (ICLR)},
year={2021}
}
提供机构:
hon9kon9ize
原始信息汇总
数据集概述
数据集名称
- Cantonese Translated MMLU dataset
数据集特征
- input: 字符串类型
- target: 字符串类型
- A: 字符串类型
- B: 字符串类型
- C: 字符串类型
- D: 字符串类型
- translated_input: 字符串类型
- translated_target: 字符串类型
- translated_A: 字符串类型
- translated_B: 字符串类型
- translated_C: 字符串类型
- translated_D: 字符串类型
- split: 字符串类型
- category: 字符串类型
- line: 整数类型(int64)
数据集分割
- train:
- 数据量: 30895860 字节
- 示例数: 17388
数据集大小
- 下载大小: 13078424 字节
- 数据集大小: 30895860 字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:



