five

Shanghai Dialect and Madarin

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/shanghai-dialect-and-madarin
下载链接
链接失效反馈
官方服务:
资源简介:
 This dataset is designed for the classification of spoken conversations in Shanghai dialect and Mandarin Chinese, providing a valuable resource for dialect classification, speech recognition, and natural language processing (NLP) research. It consists of high-quality audio recordings of natural conversations, carefully curated to ensure diverse linguistic patterns, varying speech speeds, and authentic pronunciation.Each audio sample is annotated with corresponding language labels (Shanghai dialect: 1, Mandarin: 0) and includes relevant metadata such as speaker demographics (age, gender, region), conversation context, and recording conditions. The dataset captures real-world spoken interactions, allowing researchers to develop and evaluate models for automatic dialect identification, accent adaptation, and speech-to-text applications.By offering a well-structured collection of real-world spoken dialogues, this dataset contributes to improving speech recognition systems, enhancing language identification models, and advancing dialect-aware NLP technologies. It is especially useful for training deep learning models that require extensive labeled data to improve classification accuracy and robustness. This dataset is publicly available and can be leveraged for academic research, AI-based language modeling, and real-time speech processing applications.
提供机构:
Bao, Yida
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作