Tanushreeeeee/CodeMixBench

Name: Tanushreeeeee/CodeMixBench
Creator: Tanushreeeeee
Published: 2025-12-13 14:30:37
License: 暂无描述

Hugging Face2025-12-13 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Tanushreeeeee/CodeMixBench

下载链接

链接失效反馈

官方服务：

资源简介：

CodeMixBench是一个用于评估大型语言模型（LLMs）在18种语言中代码混合能力的基准数据集。它包含八个任务，涵盖知识推理（如CM-MMLU）、数学推理（如CM-GSM8K）、真实性评估（如CM-TruthfulQA）以及传统的NLP任务（如LID、NER、POS、SA和MT）。数据集支持多种语言，包括中文、英文、西班牙语、印地语等，并提供了详细的统计信息和加载方法。

CodeMixBench is a benchmark dataset designed to evaluate the code-mixing capabilities of large language models (LLMs) across 18 languages. It comprises eight tasks, including knowledge reasoning (e.g., CM-MMLU), mathematical reasoning (e.g., CM-GSM8K), truthfulness assessment (e.g., CM-TruthfulQA), and traditional NLP tasks such as Language Identification (LID), Named Entity Recognition (NER), Part-of-Speech tagging (POS), Sentiment Analysis (SA), and Machine Translation (MT). The dataset supports multiple languages, including Chinese, English, Spanish, Hindi, etc., and provides detailed statistics and loading instructions.

提供机构：

Tanushreeeeee

5,000+

优质数据集

54 个

任务类型

进入经典数据集