SAMPLE Global English Speech with Accent Conversational Dataset — Multi-Region Validated Speech ...
收藏Databricks2025-07-31 收录
下载链接:
https://marketplace.databricks.com/details/fe39eb38-03d0-4136-9001-f352f9f24de9/FileMarket_SAMPLE-Global-English-Speech-with-Accent-Conversational-Dataset-—-Multi-Region-Validated-Speech-
下载链接
链接失效反馈官方服务:
资源简介:
The Global English Accent Conversational NLP Dataset is a comprehensive collection of validated English speech recordings sourced from native and non-native English speakers across key global regions. This dataset is designed for training Natural Language Processing models, conversational AI, Automatic Speech Recognition (ASR), and linguistic research, with a focus on regional accent variation.
Regions and Covered Countries with Primary Spoken Languages:
Africa:
South Africa (English, Zulu, Afrikaans, Xhosa)
Nigeria (English, Yoruba, Igbo, Hausa)
Kenya (English, Swahili)
Ghana (English, Twi, Ewe, Ga)
Uganda (English, Luganda)
Ethiopia (English, Amharic, Oromo)
Central & South America:
Mexico (Spanish, English as a second language)
Guatemala (Spanish, K'iche', English)
El Salvador (Spanish, English)
Costa Rica (Spanish, English in Caribbean regions)
Colombia (Spanish, English in urban centers)
Dominican Republic (Spanish, English in tourist zones)
Brazil (Portuguese, English in urban areas)
Argentina (Spanish, English among educated speakers)
Southeast Asia & South Asia:
Philippines (Filipino, English)
Vietnam (Vietnamese, English)
Malaysia (Malay, English, Mandarin)
Indonesia (Indonesian, Javanese, English)
Singapore (English, Mandarin, Malay, Tamil)
India (Hindi, English, Bengali, Tamil)
Pakistan (Urdu, English, Punjabi)
Europe:
United Kingdom (English)
Ireland (English, Irish)
Germany (German, English)
France (French, English)
Spain (Spanish, Catalan, English)
Italy (Italian, English)
Portugal (Portuguese, English)
Oceania:
Australia (English)
New Zealand (English, Māori)
Fiji (English, Fijian)
North America:
United States (English, Spanish)
Canada (English, French)
Dataset Attributes:
- Conversational English with natural accent variation
- Global coverage with balanced male/female speakers
- Rich speaker metadata: age, gender, country, city
- Average audio length of ~30 minutes per participant
- All samples manually validated for accuracy
- Structured format suitable for machine learning and AI applications
Best suited for:
- NLP model training and evaluation
- Multilingual ASR system development
- Voice assistant and chatbot design
- Accent recognition research
- Voice synthesis and TTS modeling
This dataset ensures global linguistic diversity and delivers high-quality audio for AI developers, researchers, and enterprises working on voice-based applications.
提供机构:
FileMarket
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集收录了全球多个地区英语母语及非母语者的对话录音,涵盖丰富的人口统计信息和地域口音差异,专为NLP模型训练、语音识别及语言研究设计。所有样本均经过人工验证,包含年龄、性别等元数据,平均每个参与者提供30分钟音频。
以上内容由遇见数据集搜集并总结生成



