include-base-44
收藏魔搭社区2026-01-02 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/CohereForAI/include-base-44
下载链接
链接失效反馈官方服务:
资源简介:
# INCLUDE-base (44 languages)
## Dataset Description
<!-- - **Repository**: https://github.com/agromanou/ -->
- **Paper**: http://arxiv.org/abs/2411.19799
### Dataset Summary
INCLUDE is a comprehensive knowledge- and reasoning-centric benchmark across **44 languages** that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed.
It contains 22,637 4-option multiple-choice-questions (MCQ) extracted from academic and professional exams, covering 57 topics, including regional knowledge.
For a quicker evaluation, you can use [include-lite-44](https://huggingface.co/datasets/CohereLabs/include-lite-44), which is a subset of `include-base-44`, covering the same 44 languages.
### Languages
Albanian, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Chinese, Croatian, Dutch, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesia, Italian, Japanese, Kazakh, Korean, Lithuanian, Malay, Malayalam, Nepali, North Macedonian, Persian, Polish, Portuguese, russian, Serbian, Spanish, Tagalog, Tamil, Telugu, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese
### Topics
- **Academic**:
Accounting, Agriculture, Anthropology, Architecture and Design, Arts & Humanities, Biology, Business administration, Business ethics, Business, Chemistry, Computer Science, Culturology, Earth science, Economics, Education, Engineering, Environmental studies and forestry, Family and consumer science, Finance, Geography, Health, History, Human physical performance and recreation, Industrial and labor relations, International trade, Journalism, media studies, and communication, Language, Law, Library and museum studies, Literature, Logic, Management, Marketing, Math, Medicine, Military Sciences, Multiple exams, Performing arts, Philosophy, Physics, Political sciences, Psychology, Public Administration, Public Policy, Qualimetry, Religious studies, Risk management and insurance, Social Work, Social work, Sociology, STEM, Transportation, Visual Arts
- **Licenses**:
Driving License, Marine License, Medical License, Professional Certifications
### Data schema
An example from a French Law question looks as follows:
```
{
"language": "French",
"country": "France",
"level": "Academic",
"domain": "Arts & Humanities",
"subject": "Law",
"regional_feature": "region explicit",
"question": "Que permet l'article 49-3 de la Constitution ?",
"choices": ["de recourir au référendum", "au Parlement de contrôler l'action du Gouvernement", "l'adoption sans vote d'une loi", "de prononcer la dissolution de l'Assemblée nationale"],
"answer": 2
}
```
### Model Performance
Models performance on **INCLUDE** using the Harness-eval framework.
| **Model** | **Original Language instructions** | **English instructions** |
|------------------------------------|:--------------------------:|:--------------------:|
| Llama3.1-70B-Instruct | 70.6 | 70.9 |
| Qwen2.5-14B | 62.3 | 62.6 |
| Aya-expanse-32b | 59.1 | 59.5 |
| Qwen2.5-7B | 55.0 | 55.5 |
| Qwen2.5-7B-Instruct | 54.8 | 54.8 |
| Llama-3.1-8B-Instruct | 53.5 | 54.4 |
| Gemma-7B | 53.5 | 53.2 |
| Llama-3.1-8B | 51.2 | 51.9 |
| Aya-expanse-8b | 47.2 | 47.8 |
| Mistral-7B | 44.1 | 44.6 |
| Mistral-7B-Instruct | 44.2 | 44.3 |
| Gemma-7B-Instruct | 38.6 | 39.3 |
## Citation
```
@article{romanou2024include,
title={INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge},
author={Romanou, Angelika and Foroutan, Negar and Sotnikova, Anna and Chen, Zeming and Nelaturu, Sree Harsha and Singh, Shivalika and Maheshwary, Rishabh and Altomare, Micol and Haggag, Mohamed A and Amayuelas, Alfonso and others},
journal={arXiv preprint arXiv:2411.19799},
year={2024}
}
```
# INCLUDE-base(44语言版)
## 数据集说明
<!-- - **代码仓库**:https://github.com/agromanou/ -->
- **论文链接**:http://arxiv.org/abs/2411.19799
### 数据集概述
INCLUDE是一款覆盖**44种语言**的综合性知识与推理导向基准测试集,用于评估多语言大语言模型(Large Language Model, LLM)在实际部署语言环境中的性能表现。
该数据集包含22637道四选项单项选择题(multiple-choice questions, MCQ),题目均提取自学术与专业考试,涵盖57个主题,包含区域知识内容。
如需快速开展评估,可使用[include-lite-44](https://huggingface.co/datasets/CohereLabs/include-lite-44),该子集为`include-base-44`的精简版本,同样覆盖上述44种语言。
### 覆盖语言
阿尔巴尼亚语、阿拉伯语、亚美尼亚语、阿塞拜疆语、巴斯克语、白俄罗斯语、孟加拉语、保加利亚语、汉语、克罗地亚语、荷兰语、爱沙尼亚语、芬兰语、法语、格鲁吉亚语、德语、希腊语、希伯来语、印地语、匈牙利语、印度尼西亚语、意大利语、日语、哈萨克语、韩语、立陶宛语、马来语、马拉雅拉姆语、尼泊尔语、北马其顿语、波斯语、波兰语、葡萄牙语、俄语、塞尔维亚语、西班牙语、他加禄语、泰米尔语、泰卢固语、土耳其语、乌克兰语、乌尔都语、乌兹别克语、越南语
### 覆盖主题
- **学术类**:
会计、农学、人类学、建筑学与设计、艺术与人文、生物学、工商管理学、商业伦理、商务学、化学、计算机科学、文化研究、地球科学、经济学、教育学、工程学、环境研究与林学、家庭与消费科学、金融学、地理学、健康学、历史学、人体运动与休闲、工业与劳动关系、国际贸易、新闻学、媒体研究与传播学、语言学、法学、图书馆与博物馆学、文学、逻辑学、管理学、市场营销学、数学、医学、军事科学、多类考试、表演艺术、哲学、物理学、政治学、心理学、公共管理学、公共政策学、质量管理学、宗教学、风险管理与保险、社会工作、社会工作、社会学、STEM、交通运输、视觉艺术
- **执业资质类**:
驾驶执照、海事执照、医疗执照、专业认证
### 数据格式
以下为一道法国法律类试题的示例:
json
{
"语言": "French",
"国家": "France",
"层级": "Academic",
"领域": "Arts & Humanities",
"学科": "Law",
"区域特征": "region explicit",
"试题内容": "Que permet l'article 49-3 de la Constitution ?",
"选项": ["de recourir au référendum", "au Parlement de contrôler l'action du Gouvernement", "l'adoption sans vote d'une loi", "de prononcer la dissolution de l'Assemblée nationale"],
"正确选项序号": 2
}
### 模型性能
以下为基于Harness-eval评测框架测得的各模型在**INCLUDE**基准测试集上的性能表现:
| **模型** | **原语言指令** | **英语指令** |
|------------------------------------|:--------------------------:|:--------------------:|
| Llama3.1-70B-Instruct | 70.6 | 70.9 |
| Qwen2.5-14B | 62.3 | 62.6 |
| Aya-expanse-32b | 59.1 | 59.5 |
| Qwen2.5-7B | 55.0 | 55.5 |
| Qwen2.5-7B-Instruct | 54.8 | 54.8 |
| Llama-3.1-8B-Instruct | 53.5 | 54.4 |
| Gemma-7B | 53.5 | 53.2 |
| Llama-3.1-8B | 51.2 | 51.9 |
| Aya-expanse-8b | 47.2 | 47.8 |
| Mistral-7B | 44.1 | 44.6 |
| Mistral-7B-Instruct | 44.2 | 44.3 |
| Gemma-7B-Instruct | 38.6 | 39.3 |
## 引用格式
bibtex
@article{romanou2024include,
title={INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge},
author={Romanou, Angelika and Foroutan, Negar and Sotnikova, Anna and Chen, Zeming and Nelaturu, Sree Harsha and Singh, Shivalika and Maheshwary, Rishabh and Altomare, Micol and Haggag, Mohamed A and Amayuelas, Alfonso and others},
journal={arXiv preprint arXiv:2411.19799},
year={2024}
}
提供机构:
maas
创建时间:
2024-12-15
搜集汇总
数据集介绍

背景与挑战
背景概述
INCLUDE-base-44是一个涵盖44种语言的多语言知识和推理基准数据集,包含22,637个来自学术和专业考试的四选项多项选择题,覆盖57个主题,旨在评估大语言模型在实际语言环境中的性能。数据集特别强调区域知识,如示例中法国法律问题所示,适用于多语言模型评估和比较。
以上内容由遇见数据集搜集并总结生成



