WenetSpeech-Yue

Name: WenetSpeech-Yue
Creator: maas
Published: 2026-05-15 13:48:47
License: 暂无描述

魔搭社区2026-05-15 更新2025-09-13 收录

下载链接：

https://modelscope.cn/datasets/pengzhendong/WenetSpeech-Yue

下载链接

链接失效反馈

官方服务：

资源简介：

# WenetSpeech-Yue: A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation Longhao Li1*, Zhao Guo1*, Hongjie Chen2, Yuhang Dai1, Ziyu Zhang1, Hongfei Xue1, Tianlun Zuo1, Chengyou Wang1, Shuiyuan Wang1, Xin Xu3, Hui Bu3, Jie Li2, Jian Kang2, Binbin Zhang4, Ruibin Yuan5, Ziya Zhou5, Wei Xue5, Lei Xie1 1 Audio, Speech and Language Processing Group (ASLP@NPU), Northwestern Polytechnical University 2 Institute of Artificial Intelligence (TeleAI), China Telecom 3 Beijing AISHELL Technology Co., Ltd. 4 WeNet Open Source Community 5 Hong Kong University of Science and Technology 📑 <a href="https://arxiv.org/abs/2509.03959">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🐙 <a href="https://github.com/ASLP-lab/WenetSpeech-Yue">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/collections/ASLP-lab/wenetspeech-yue-68b690d287cde88389e5dba1">HuggingFace</a> 🖥️ <a href="https://huggingface.co/spaces/ASLP-lab/WenetSpeech-Yue">HuggingFace Space</a> &nbsp&nbsp | &nbsp&nbsp 🎤 <a href="https://aslp-lab.github.io/WenetSpeech-Yue/">Demo Page</a> &nbsp&nbsp | &nbsp&nbsp 💬 <a href="https://github.com/ASLP-lab/WenetSpeech-Yue?tab=readme-ov-file#contact">Contact Us</a> <div align="center"> <img width="800px" src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/wenetspeech_yue.svg" /> </div> ## Dataset ### WenetSpeech-Yue Overview * Contains 21,800 hours of large-scale Cantonese speech corpus with rich annotations, the largest open-source resource for Cantonese speech research. * Stores metadata in a single JSON file, including audio path, duration, text confidence, speaker identity, SNR, DNSMOS, age, gender, and character-level timestamps. Additional metadata tags may be added in the future. * Covers ten domains: Storytelling, Entertainment, Drama, Culture, Vlog, Commentary, Education, Podcast, News, and Others. <div align="center"> <img width="800px" src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/data_distribution.png" /> </div> ### Metadata Format We store all audio metadata in a standardized JSON format, where the core fields include `utt_id` (unique identifier for each audio segment), `rover_result` (ROVER result of three ASR transcriptions), `confidence` (confidence score of text transcription), `jyutping_confidence` (confidence score of Cantonese pinyin transcriptions), and `duration` (audio duration); speaker attributes include `speaker_id`, `gender`, and `age`; audio quality assessment metrics include `sample_rate`, `DNSMOS`, and `SNR`; timestamp information includes `timestamp` (precisely recording segment boundaries with `start` and `end`); and extended metadata under the `meta_info` field includes `program` (program name), `region` (geographical information), `link` (original content link), and `domain` (domain classification). Json Example: ``` { "key": "xg0054364_9798410_9801030", "rover_result": "人多一齐食咁样先至知味", "confidence": 0.879, "jyutping_confidence": 0.909, "duration": 2.816, "meta_info": { "region": "Hong Kong", "program": "Cantonese radio drama "I'll Send You Flowers Next Year" featuring Kathy Chow, Jacob Tsui, and Law Wai-kit. A 2002 production by Radio Television Hong Kong (RTHK).", "time_stamp": "9798.410_9801.030", "link": "<link>", "domain": "Drama" }, "speaker_attributes": { "spk_id": "xg0054364_SPEAKER_08", "gender": "Male", "age": "YOUTH" }, "speech_quality": { "sampling_rate": 16000, "DNSMOS": 3.2549686431884766, "SNR": 25.29012680053711 }, "timestamps": [ [["<eps>", [0.0, 0.26]], ["人", [0.26, 0.48]], ["多", [0.48, 0.64]], ["一", [0.64, 0.74]], ["齐", [0.74, 0.92]]], [["食", [0.93, 1.15]], ["<eps>", [1.15, 1.39]], ["咁", [1.39, 1.53]], ["样", [1.52, 1.6]], ["先", [1.6, 1.75]]], [["至", [1.75, 1.83]], ["知", [1.83, 2.04]], ["味", [2.04, 2.4]], ["<eps>", [2.4, 2.78]]] ] } ``` ### WenetSpeech Usage You can obtain the original video source through the `link` field in the metadata file (`wenetspeech_yue_meta.json`). Segment the audio according to the `cut_point` field to extract the corresponding record. For pre-processed audio data, please contact us using the information provided below. ## Contact If you have any questions or would like to collaborate, feel free to reach out to our research team via email: lhli@mail.nwpu.edu.cn or gzhao@mail.nwpu.edu.cn You’re also welcome to join our WeChat group for technical discussions, updates, and — as mentioned above — access to pre-processed audio data. <img src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/wechat.jpg" width="300" alt="WeChat Group QR Code"/> Scan to join our WeChat discussion group <img src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/npu@aslp.jpeg" width="300" alt="Official Account QR Code"/>

# WenetSpeech-Yue：面向粤语语音研究的多维度标注大规模语料库 李龙浩1*, 郭钊1*, 陈宏杰2, 戴宇航1, 张子瑜1, 薛鸿飞1, 左天伦1, 王承友1, 王水渊1, 徐鑫3, 卜辉3, 李杰2, 康健2, 张彬彬4, 袁瑞斌5, 周子雅5, 薛巍5, 谢磊1 1 西北工业大学音频、语音与语言处理课题组（Audio, Speech and Language Processing Group, ASLP@NPU） 2 中国电信人工智能研究院（TeleAI） 3 北京爱数智慧科技有限公司（Beijing AISHELL Technology Co., Ltd.） 4 WeNet开源社区（WeNet Open Source Community） 5 香港科技大学 📑 <a href="https://arxiv.org/abs/2509.03959">论文</a> &nbsp&nbsp | &nbsp&nbsp 🐙 <a href="https://github.com/ASLP-lab/WenetSpeech-Yue">GitHub仓库</a> &nbsp&nbsp | &nbsp&nbsp 🤗 <a href="https://huggingface.co/collections/ASLP-lab/wenetspeech-yue-68b690d287cde88389e5dba1">HuggingFace集合页面</a> 🖥️ <a href="https://huggingface.co/spaces/ASLP-lab/WenetSpeech-Yue">HuggingFace演示空间</a> &nbsp&nbsp | &nbsp&nbsp 🎤 <a href="https://aslp-lab.github.io/WenetSpeech-Yue/">官方演示页面</a> &nbsp&nbsp | &nbsp&nbsp 💬 <a href="https://github.com/ASLP-lab/WenetSpeech-Yue?tab=readme-ov-file#contact">联系我们</a> <div align="center"> <img width="800px" src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/wenetspeech_yue.svg" /> </div> ## 数据集 ### WenetSpeech-Yue 概述 * 包含21800小时的大规模粤语语音语料库，附带丰富的多维度标注，是目前开源领域规模最大的粤语语音研究资源。 * 所有元数据存储于单个JSON文件中，涵盖音频路径、时长、文本置信度、说话人身份、信噪比（Signal-to-Noise Ratio, SNR）、DNSMOS评分、年龄、性别以及字符级时间戳等信息，未来或将新增更多元数据标签。 * 覆盖十大应用领域：故事讲述、娱乐、戏剧、文化、Vlog、评论、教育、播客、新闻及其他类别。 <div align="center"> <img width="800px" src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/data_distribution.png" /> </div> ### 元数据格式我们采用标准化JSON格式存储所有音频元数据，核心字段包括：`utt_id`（单条音频片段的唯一标识符）、`rover_result`（三种自动语音识别（Automatic Speech Recognition, ASR）结果的ROVER融合结果）、`confidence`（文本转录置信度评分）、`jyutping_confidence`（粤语拼音（Jyutping）转录置信度评分）以及`duration`（音频时长）；说话人属性包含`speaker_id`、`gender`及`age`；音频质量评估指标涵盖`sample_rate`（采样率）、`DNSMOS`及`SNR`；时间戳信息包含`timestamp`（精确记录片段起止边界，含`start`与`end`字段）；`meta_info`字段下的扩展元数据包括`program`（节目名称）、`region`（地域信息）、`link`（原始内容链接）及`domain`（领域分类）。 JSON示例： json { "key": "xg0054364_9798410_9801030", "rover_result": "人多一齐食咁样先至知味", "confidence": 0.879, "jyutping_confidence": 0.909, "duration": 2.816, "meta_info": { "region": "Hong Kong", "program": "Cantonese radio drama "I'll Send You Flowers Next Year" featuring Kathy Chow, Jacob Tsui, and Law Wai-kit. A 2002 production by Radio Television Hong Kong (RTHK).", "time_stamp": "9798.410_9801.030", "link": "<link>", "domain": "Drama" }, "speaker_attributes": { "spk_id": "xg0054364_SPEAKER_08", "gender": "Male", "age": "YOUTH" }, "speech_quality": { "sampling_rate": 16000, "DNSMOS": 3.2549686431884766, "SNR": 25.29012680053711 }, "timestamps": [ [["<eps>", [0.0, 0.26]], ["人", [0.26, 0.48]], ["多", [0.48, 0.64]], ["一", [0.64, 0.74]], ["齐", [0.74, 0.92]]], [["食", [0.93, 1.15]], ["<eps>", [1.15, 1.39]], ["咁", [1.39, 1.53]], ["样", [1.52, 1.6]], ["先", [1.6, 1.75]]], [["至", [1.75, 1.83]], ["知", [1.83, 2.04]], ["味", [2.04, 2.4]], ["<eps>", [2.4, 2.78]]] ] } ### WenetSpeech-Yue 使用方式您可通过元数据文件（`wenetspeech_yue_meta.json`）中的`link`字段获取原始视频源，并根据`cut_point`字段对音频进行分段，提取对应语音片段。如需获取预处理后的音频数据，请通过下方联系方式与我们取得联系。 ## 联系方式若您有任何疑问或合作意向，可通过以下邮箱联系我们的研究团队：lhli@mail.nwpu.edu.cn 或 gzhao@mail.nwpu.edu.cn 您也可加入我们的微信技术交流群，获取最新动态与技术讨论——如前文所述，该群亦可提供预处理音频数据的获取渠道。 <img src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/wechat.jpg" width="300" alt="微信讨论群二维码"/> 扫码加入我们的微信讨论群 <img src="https://github.com/ASLP-lab/WenetSpeech-Yue/raw/main/figs/npu@aslp.jpeg" width="300" alt="官方公众号二维码"/>

提供机构：

maas

创建时间：

2025-10-23

搜集汇总

数据集介绍