MT-Bench-Hi

Name: MT-Bench-Hi
Creator: maas
Published: 2025-12-04 09:19:27
License: 暂无描述

魔搭社区2025-12-04 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/nv-community/MT-Bench-Hi

下载链接

链接失效反馈

官方服务：

资源简介：

## Dataset Description: The MT-Bench-Hi (Hindi MT-Bench) dataset is a multi-turn question set containing 200 prompts in the Hindi language to evaluate the conversational ability of the Hindi large language models (LLMs). The dataset has 80% of samples created natively by specialists well-versed in Hindi and 20% of the samples that are translated from the English version of the dataset. This dataset is ready for commercial/non-commercial use. The evaluation steps are described [here](https://huggingface.co/datasets/nvidia/MT-Bench-Hi/blob/main/EVAL.md). ## Dataset Owner: NVIDIA Corporation ## Dataset Creation Date: April 2025 ## License/Terms of Use: CC-BY 4.0 ## Intended Usage: Evaluate the multi-turn conversational capability of the LLM in Hindi language in 8 different domains like writing, humanities, extraction, roleplay, math, coding, reasoning, and STEM. ## Dataset Characterization Data Collection Method<br> * Hybrid: Human, Synthetic <br> Labeling Method<br> * Not Applicable <br> ## Dataset Format Text ## Dataset Quantification 377KB of multi-turn query prompts, comprising 200 individual samples. ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/). ## Citing If you find our work helpful, please consider citing our paper: ``` @article{kamath2025benchmarking, title={Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis}, author={Kamath, Anusha and Singla, Kanishk and Paul, Rakesh and Joshi, Raviraj and Vaidya, Utkarsh and Chauhan, Sanjay Singh and Wartikar, Niranjan}, journal={arXiv preprint arXiv:2508.19831}, year={2025} } ```

## 数据集描述 MT-Bench-Hi（印地语MT-Bench）数据集是一套多轮提示集，包含200条印地语提示词，用于评估印地语大语言模型（Large Language Model，LLM）的对话能力。该数据集80%的样本由精通印地语的专业人员原生创作，剩余20%的样本由该数据集的英文版本翻译而来。本数据集可用于商业与非商业用途。评估步骤详见[此处](https://huggingface.co/datasets/nvidia/MT-Bench-Hi/blob/main/EVAL.md)。 ## 数据集所有者英伟达（NVIDIA）公司 ## 数据集创建日期 2025年4月 ## 许可与使用条款知识共享署名4.0（CC-BY 4.0） ## 预期用途评估印地语大语言模型在8个不同领域的多轮对话能力，涵盖写作、人文、信息抽取、角色扮演、数学、编程、推理以及科学、技术、工程与数学（STEM）领域。 ## 数据集特征数据收集方法 * 混合模式：人工创作、合成生成标注方法 * 不适用 ## 数据集格式文本格式 ## 数据集量化情况该数据集包含200条独立样本，多轮查询提示词总大小为377KB。 ## 伦理考量英伟达（NVIDIA）认为，可信人工智能是一项共同责任，我们已制定相关政策与实践规范，以支撑各类人工智能应用的开发。开发者在按照服务条款下载或使用本数据集时，应与其内部模型团队协同工作，确保该模型符合相关行业与应用场景的要求，并应对潜在的产品误用问题。请通过[此处](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)报告安全漏洞或英伟达人工智能相关问题。 ## 引用说明若您认为本工作对您有所帮助，请引用我们的论文： @article{kamath2025benchmarking, title={Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis}, author={Kamath, Anusha and Singla, Kanishk and Paul, Rakesh and Joshi, Raviraj and Vaidya, Utkarsh and Chauhan, Sanjay Singh and Wartikar, Niranjan}, journal={arXiv preprint arXiv:2508.19831}, year={2025} }

提供机构：

maas

创建时间：

2025-10-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集