TELEVAL

Name: TELEVAL
Creator: maas
Published: 2026-01-06 16:42:21
License: 暂无描述

魔搭社区2026-01-06 更新2025-08-16 收录

下载链接：

https://modelscope.cn/datasets/TeleAI/TELEVAL

下载链接

链接失效反馈

官方服务：

资源简介：

更多细节，请参见Github：**https://github.com/Tele-AI/TELEVAL** **TELEVAL** 是一个为语音对话大模型（Spoken-Language Models, SLMs）设计的动态评测基准，针对中文交互场景，划分为三个维度：显性语义（Explicit Semantics）、隐性语义与副语言信息（Paralinguistic & Implicit Semantics）、系统能力（System Abilities）。包含基础知识、方言理解与回应、副语言信息理解与回应等多个任务与测评能力。 - **多维实用性评估 🧠**：覆盖12大任务34个数据集，数据持续扩充中。 - **真实交互测试 🎧**：模结合实际交互需求（如知识问答、拟人陪伴等），构造自然、真实的对话场景，避免任务型指令如“我是个小孩子，我应该...”、“我现在是什么心情？” ，全面考察模型对用户语音的自然对话能力。 - **多语种与多方言数据支持 🌏**：评测数据以中文普通话为主，同时涵盖英文问答与多种中国方言（如粤语、河南话、东北话、上海话、四川话等）。 - **模块化评测框架 🔧**：完整的模型推理与结果评估框架，推理与评估流程解耦，支持使用已有推理结果进行评估，自定义模型、任务与数据集。支持SLM和LLM的推理、评估。 **TELEVAL** is a dynamic evaluation benchmark designed for Spoken-Language Models (SLMs), focusing on Chinese interactive scenarios. It covers three main dimensions: **Explicit Semantics**, **Paralinguistic & Implicit Semantics**, and **System Abilities**, with tasks ranging from basic knowledge to dialect understanding and paralinguistic response. - **Multi-dimensional Evaluation 🧠**: Covers 12 tasks across 34 datasets, with more continuously added. - **Real-world Interaction Testing 🎧**: Designed around natural, realistic dialogue needs (e.g., knowledge Q&A, human-like companionship), avoiding artificial prompts like “I'm a child, what should I...” or “What mood am I in?”. - **Multilingual & Dialect-rich Data 🌏**: Primarily based on Mandarin Chinese, with additional coverage of English Q&A and multiple Chinese dialects (e.g., Cantonese, Henan, Northeastern, Shanghainese, Sichuanese). - **Modular Evaluation Framework 🔧**: A full inference and evaluation framework with a decoupled design. Supports evaluating existing inference results and customizing models, tasks, and datasets. Works for both SLMs and LLMs. For Usage and Results, see Github: **https://github.com/Tele-AI/TELEVAL**

更多细节请访问Github：https://github.com/Tele-AI/TELEVAL **TELEVAL** 是一款面向语音对话大模型（Spoken-Language Models, SLMs）的动态评测基准，聚焦中文交互场景，划分为三大维度：显性语义、隐性语义与副语言信息、系统能力，涵盖基础知识、方言理解与回应、副语言信息理解与回应等多项任务与测评能力。 - **多维实用性评测 🧠**：覆盖12大任务与34个数据集，数据规模仍在持续扩充中。 - **真实交互测试 🎧**：围绕知识问答、拟人陪伴等真实自然的对话需求构建场景，规避“我是个小孩子，我应该...”“我现在是什么心情？”这类人工指令式提示，全面考察模型对用户语音的自然对话能力。 - **多语种与多方言数据支持 🌏**：评测数据以中文普通话为核心，同时涵盖英文问答与多种中国方言（如粤语、河南话、东北话、上海话、四川话等）。 - **模块化评测框架 🔧**：配备完整的模型推理与结果评估框架，采用推理与评估流程解耦的设计，支持基于已有推理结果开展评测，可自定义模型、任务与数据集，适配语音对话大模型（SLMs）与大语言模型（Large Language Model, LLM）的推理与评估工作。如需了解使用方法与评测结果，请访问Github：https://github.com/Tele-AI/TELEVAL

提供机构：

maas

创建时间：

2025-08-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集