Shanghai Dialect and Madarin
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/shanghai-dialect-and-madarin
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is designed for the classification of spoken conversations in Shanghai dialect and Mandarin Chinese, providing a valuable resource for dialect classification, speech recognition, and natural language processing (NLP) research. It consists of high-quality audio recordings of natural conversations, carefully curated to ensure diverse linguistic patterns, varying speech speeds, and authentic pronunciation.Each audio sample is annotated with corresponding language labels (Shanghai dialect: 1, Mandarin: 0) and includes relevant metadata such as speaker demographics (age, gender, region), conversation context, and recording conditions. The dataset captures real-world spoken interactions, allowing researchers to develop and evaluate models for automatic dialect identification, accent adaptation, and speech-to-text applications.By offering a well-structured collection of real-world spoken dialogues, this dataset contributes to improving speech recognition systems, enhancing language identification models, and advancing dialect-aware NLP technologies. It is especially useful for training deep learning models that require extensive labeled data to improve classification accuracy and robustness. This dataset is publicly available and can be leveraged for academic research, AI-based language modeling, and real-time speech processing applications.
提供机构:
Bao, Yida



