AIDC-AI/HalloMTBench
收藏Hugging Face2026-01-13 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/AIDC-AI/HalloMTBench
下载链接
链接失效反馈官方服务:
资源简介:
HalloMTBench是一个新的、具有挑战性的基准,旨在评估大型语言模型(LLMs)在翻译任务中对抗幻觉的性能。该数据集包含6,908个高质量、专家验证的样本,这些样本捕捉了自然发生的幻觉现象,为评估模型在翻译任务中的安全性和可靠性提供了一个成本效益高且稳健的工具。数据集支持11种高资源语言对,英语作为源语言,目标语言包括西班牙语、法语、意大利语、葡萄牙语、德语、俄语、阿拉伯语、越南语、中文、日语和韩语。
HalloMTBench is a new and challenging benchmark designed to evaluate the performance of Large Language Models (LLMs) against translation hallucinations. The result is a high-quality, expert-verified dataset of 6,908 challenging samples that capture naturally occurring hallucinations, providing a cost-effective and robust tool for evaluating model safety and reliability in translation tasks. The dataset covers 11 high-resource language pairs, with English as the source language and target languages including Spanish, French, Italian, Portuguese, German, Russian, Arabic, Vietnamese, Chinese, Japanese, and Korean.
提供机构:
AIDC-AI



