include-lite-44
收藏魔搭社区2025-12-03 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/CohereForAI/include-lite-44
下载链接
链接失效反馈官方服务:
资源简介:
# INCLUDE-lite (44 languages)
## Dataset Description
<!-- - **Repository**: https://github.com/agromanou/ -->
- **Paper**: http://arxiv.org/abs/2411.19799
### Dataset Summary
INCLUDE is a comprehensive knowledge- and reasoning-centric benchmark across **44 languages** that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed.
It contains 11,095 4-option multiple-choice-questions (MCQ) extracted from academic and professional exams, covering 57 topics, including regional knowledge.
For evaluation in a larger set, you can use [include-base-44](https://huggingface.co/datasets/CohereLabs/include-base-44), which is a superset of `include-lite-44`, covering the same 44 languages.
### Languages
Albanian, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Chinese, Croatian, Dutch, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesia, Italian, Japanese, Kazakh, Korean, Lithuanian, Malay, Malayalam, Nepali, North Macedonian, Persian, Polish, Portuguese, russian, Serbian, Spanish, Tagalog, Tamil, Telugu, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese
### Topics
- **Academic**:
Accounting, Agriculture, Anthropology, Architecture and Design, Arts & Humanities, Biology, Business administration, Business ethics, Business, Chemistry, Computer Science, Culturology, Earth science, Economics, Education, Engineering, Environmental studies and forestry, Family and consumer science, Finance, Geography, Health, History, Human physical performance and recreation, Industrial and labor relations, International trade, Journalism, media studies, and communication, Language, Law, Library and museum studies, Literature, Logic, Management, Marketing, Math, Medicine, Military Sciences, Multiple exams, Performing arts, Philosophy, Physics, Political sciences, Psychology, Public Administration, Public Policy, Qualimetry, Religious studies, Risk management and insurance, Social Work, Social work, Sociology, STEM, Transportation, Visual Arts
- **Licenses**:
Driving License, Marine License, Medical License, Professional Certifications
### Data schema
An example from a French Law question looks as follows:
```
{
"language": "French",
"country": "France",
"level": "Academic",
"domain": "Arts & Humanities",
"subject": "Law",
"regional_feature": "region explicit",
"question": "Que permet l'article 49-3 de la Constitution ?",
"choices": ["de recourir au référendum", "au Parlement de contrôler l'action du Gouvernement", "l'adoption sans vote d'une loi", "de prononcer la dissolution de l'Assemblée nationale"],
"answer": 2
}
```
### Model Performance
Models performance on **INCLUDE** using the Harness-eval framework.
| **Model** | **Original Lang instructions** | **English instructions** |
|------------------------------------|:------------------------------:|:------------------------:|
| Llama3.1-70B-Instruct | 70.3 | 70.6 |
| Qwen2.5-14B | 61.8 | 61.9 |
| Aya-expanse-32b | 58.9 | 59.5 |
| Qwen2.5-7B | 54.4 | 54.9 |
| Qwen2.5-7B-Instruct | 54.5 | 54.6 |
| Llama-3.1-8B-Instruct | 53.5 | 54.4 |
| Gemma-7B | 53.6 | 53.1 |
| Llama-3.1-8B | 51.2 | 52.1 |
| Aya-expanse-8b | 47.3 | 48.0 |
| Mistral-7B | 44.5 | 44.7 |
| Mistral-7B-Instruct | 43.8 | 43.9 |
| Gemma-7B-Instruct | 39.1 | 39.7 |
## Citation
```
@article{romanou2024include,
title={INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge},
author={Romanou, Angelika and Foroutan, Negar and Sotnikova, Anna and Chen, Zeming and Nelaturu, Sree Harsha and Singh, Shivalika and Maheshwary, Rishabh and Altomare, Micol and Haggag, Mohamed A and Amayuelas, Alfonso and others},
journal={arXiv preprint arXiv:2411.19799},
year={2024}
}
```
# INCLUDE-lite(44种语言)
## 数据集说明
<!-- - **仓库**:https://github.com/agromanou/ -->
- **论文**:http://arxiv.org/abs/2411.19799
### 数据集概览
INCLUDE是一个覆盖44种语言的综合性知识与推理基准测试集,用于评估大语言模型(Large Language Model, LLM)在实际部署语言环境中的性能表现。该数据集包含11095道四选项选择题(Multiple-Choice Questions, MCQ),均提取自学术与专业考试,涵盖57个主题,包含区域知识内容。
若需使用更大规模的测试集,可使用[include-base-44](https://huggingface.co/datasets/CohereLabs/include-base-44),该数据集为`include-lite-44`的超集,同样覆盖上述44种语言。
### 覆盖语言
阿尔巴尼亚语、阿拉伯语、亚美尼亚语、阿塞拜疆语、巴斯克语、白俄罗斯语、孟加拉语、保加利亚语、中文、克罗地亚语、荷兰语、爱沙尼亚语、芬兰语、法语、格鲁吉亚语、德语、希腊语、希伯来语、印地语、匈牙利语、印度尼西亚语、意大利语、日语、哈萨克语、韩语、立陶宛语、马来语、马拉雅拉姆语、尼泊尔语、北马其顿语、波斯语、波兰语、葡萄牙语、俄语、塞尔维亚语、西班牙语、他加禄语、泰米尔语、泰卢固语、土耳其语、乌克兰语、乌尔都语、乌兹别克语、越南语
### 覆盖主题
- **学术类**:
会计学、农学、人类学、建筑学与设计、艺术与人文、生物学、工商管理学、商业伦理学、商学、化学、计算机科学、文化研究、地球科学、经济学、教育学、工程学、环境研究与林学、家庭与消费者科学、金融学、地理学、健康科学、历史学、人体运动与休闲、工业与劳动关系、国际贸易、新闻学、媒体研究与传播学、语言学、法学、图书馆与博物馆学、文学、逻辑学、管理学、市场营销学、数学、医学、军事科学、多类考试、表演艺术、哲学、物理学、政治学、心理学、公共管理学、公共政策学、质量学、宗教学、风险管理与保险、社会工作、社会学、STEM、交通运输学、视觉艺术
- **执照类**:
驾驶执照、海事执照、医疗执照、专业认证
### 数据格式
以下为一道法国法律题的示例:
json
{
"language": "法语",
"country": "法国",
"level": "学术类",
"domain": "艺术与人文",
"subject": "法学",
"regional_feature": "区域明确",
"question": "法国宪法第49-3条规定了什么?",
"choices": ["诉诸全民公投", "议会监督政府行动", "无需投票即可通过法案", "宣布解散国民议会"],
"answer": 2
}
### 模型性能
使用Harness-eval框架在INCLUDE上的模型性能表现。
| **模型** | **原生语言指令** | **英语指令** |
|------------------------------------|:------------------------------:|:------------------------:|
| Llama3.1-70B-Instruct | 70.3 | 70.6 |
| Qwen2.5-14B | 61.8 | 61.9 |
| Aya-expanse-32b | 58.9 | 59.5 |
| Qwen2.5-7B | 54.4 | 54.9 |
| Qwen2.5-7B-Instruct | 54.5 | 54.6 |
| Llama-3.1-8B-Instruct | 53.5 | 54.4 |
| Gemma-7B | 53.6 | 53.1 |
| Llama-3.1-8B | 51.2 | 52.1 |
| Aya-expanse-8b | 47.3 | 48.0 |
| Mistral-7B | 44.5 | 44.7 |
| Mistral-7B-Instruct | 43.8 | 43.9 |
| Gemma-7B-Instruct | 39.1 | 39.7 |
## 引用
bibtex
@article{romanou2024include,
title={INCLUDE: 基于区域知识评估多语言语言理解能力},
author={Romanou, Angelika and Foroutan, Negar and Sotnikova, Anna and Chen, Zeming and Nelaturu, Sree Harsha and Singh, Shivalika and Maheshwary, Rishabh and Altomare, Micol and Haggag, Mohamed A and Amayuelas, Alfonso and others},
journal={arXiv preprint arXiv:2411.19799},
year={2024}
}
提供机构:
maas
创建时间:
2024-12-15



