SAMPLE Conversational Audio Data | Multi-Speaker | 13 Languages | GPS-Verified | Global Coverage
收藏Databricks2026-03-13 收录
下载链接:
https://marketplace.databricks.com/details/421f271c-e2f7-4554-9e18-b438f520542d/Rwazi_SAMPLE-Conversational-Audio-Data-Multi-Speaker-13-Languages-GPS-Verified-Global-Coverage
下载链接
链接失效反馈官方服务:
资源简介:
High-quality conversational audio dataset collected through a global network of 3M+ verified data collectors across 150+ countries. Each recording captures natural, unscripted 2-3 person conversations in real-world environments — homes, cafes, markets, offices — not studio conditions.
LANGUAGE COVERAGE:
US English: Regional accent diversity (South, Midwest, Northeast, West Coast, NYC, Pacific NW, Southwest)
Arabic: Egyptian, Gulf, Levantine, and Maghreb dialects across Egypt, Saudi Arabia, UAE, Jordan, Morocco
European: French, German, Spanish, Portuguese
Asian: Thai, Tagalog, Indonesian, Hindi
African: Swahili, Yoruba, Amharic
RECORDING SPECIFICATIONS:
Duration: 10-30 minutes per conversation
Format: WAV | 16-48 kHz sample rate
Speakers: 2-3 per recording
Type: Unscripted, natural conversations
METADATA PER RECORDING:
GPS-verified location (country, city, coordinates)
Recording environment classification
Device type and audio quality metrics (SNR)
Conversation type and topic tags (daily life, food, family, work, shopping, storytelling, community)
Per-speaker demographics: age range, gender, native language, education level
LICENSING & COMPLIANCE:
Explicit written consent from all participants
Full commercial usage rights
PII scrubbed from all metadata
Auditable consent chain
IDEAL FOR: ASR model training, speaker diarization, accent/dialect classification, conversational AI, emotion detection, language identification, and multilingual NLP. Scalable to custom volume, language, and demographic specifications on request.
提供机构:
Rwazi



