five

QCRI/AraDiCE-Culture

收藏
Hugging Face2024-11-05 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/QCRI/AraDiCE-Culture
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 pretty_name: 'AraDiCE -- Culture' dataset_info: - config_name: Lebanon splits: - name: test num_examples: 30 - config_name: Egypt splits: - name: test num_examples: 30 - config_name: Syria splits: - name: test num_examples: 30 - config_name: Palestine splits: - name: test num_examples: 30 - config_name: Jordan splits: - name: test num_examples: 30 - config_name: Qatar splits: - name: test num_examples: 30 configs: - config_name: Lebanon data_files: - split: test path: lebanon/LEBANON.json - config_name: Egypt data_files: - split: test path: egypt/EGYPT.json - config_name: Syria data_files: - split: test path: syria/SYRIA.json - config_name: Palestine data_files: - split: test path: palestine/PALESTINE.json - config_name: Jordan data_files: - split: test path: jordan/JORDAN.json - config_name: Qatar data_files: - split: test path: qatar/QATAR.json --- # AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs ## Overview The **AraDiCE** dataset is designed to evaluate dialectal and cultural capabilities in large language models (LLMs). The dataset consists of post-edited versions of various benchmark datasets, curated for validation in cultural and dialectal contexts relevant to Arabic. In this repository we show the cultural split of the data <!-- ## File/Directory TO DO: - **licenses_by-nc-sa_4.0_legalcode.txt** License information. - **README.md** This file. --> ## Evaluation We have used [lm-harness](https://github.com/EleutherAI/lm-evaluation-harness) eval framework to for the benchmarking. We will soon release them. Stay tuned!! ## License The dataset is distributed under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)**. The full license text can be found in the accompanying `licenses_by-nc-sa_4.0_legalcode.txt` file. ## Citation Please find the paper <a href="https://arxiv.org/pdf/2409.11404" target="_blank" style="margin-right: 15px; margin-left: 10px">here.</a> ``` @article{mousi2024aradicebenchmarksdialectalcultural, title={{AraDiCE}: Benchmarks for Dialectal and Cultural Capabilities in LLMs}, author={Basel Mousi and Nadir Durrani and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain and Tameem Kabbani and Fahim Dalvi and Shammur Absar Chowdhury and Firoj Alam}, year={2024}, publisher={arXiv:2409.11404}, url={https://arxiv.org/abs/2409.11404}, } ```
提供机构:
QCRI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作