IndicGenBench
收藏arXiv2024-04-26 更新2024-07-30 收录
下载链接:
http://www.github.com/google-research-datasets/indic-gen-bench
下载链接
链接失效反馈官方服务:
资源简介:
IndicGenBench是一个多语言基准,用于评估大型语言模型在29种印度语言上的生成能力,涵盖13种文字和4种语言家族。该数据集包含多种生成任务,如跨语言总结、机器翻译和跨语言问答,并通过人工策划为许多代表性不足的印度语言首次提供多向平行评估数据。
IndicGenBench is a multilingual benchmark for evaluating the generative capabilities of large language models across 29 Indian languages, covering 13 writing systems and 4 language families. This dataset includes a variety of generative tasks such as cross-lingual summarization, machine translation, and cross-lingual question answering, and for the first time provides multi-directional parallel evaluation data for many underrepresented Indian languages through manual curation.
创建时间:
2024-04-26



