ai4bharat/intel
收藏Hugging Face2024-12-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/ai4bharat/intel
下载链接
链接失效反馈官方服务:
资源简介:
INTEL数据集是一个多语言训练数据集,作为跨语言自动评估(CIA)套件的一部分引入。它旨在训练评估大型语言模型(LLMs)以评估低资源和多语言环境下的机器生成文本。INTEL利用自动翻译创建了一个多样化的语料库,用于评估六种语言(孟加拉语、德语、法语、印地语、泰卢固语和乌尔都语)的响应,同时保持英文参考答案和评估标准。数据集包含100k训练样本和1k验证样本,每种语言的数据来源于Feedback-Collection数据集,并通过自动翻译进行丰富。
The INTEL Dataset is a multilingual training dataset designed to train evaluator large language models (LLMs) to assess machine-generated text in low-resource and multilingual settings. The dataset includes translations in six languages—Bengali, German, French, Hindi, Telugu, and Urdu—with English references. It contains 100k training samples and 1k validation samples per language, derived from the Feedback-Collection dataset and enriched through automated translations. The intended uses include training evaluator LLMs, benchmarking multilingual LLMs, and conducting meta-evaluations of evaluation methods.
提供机构:
ai4bharat



