five

SAMPLE Conversational Audio Data | Multi-Speaker | 13 Languages | GPS-Verified | Global Coverage

收藏
Databricks2026-03-13 收录
下载链接:
https://marketplace.databricks.com/details/421f271c-e2f7-4554-9e18-b438f520542d/Rwazi_SAMPLE-Conversational-Audio-Data-Multi-Speaker-13-Languages-GPS-Verified-Global-Coverage
下载链接
链接失效反馈
官方服务:
资源简介:
High-quality conversational audio dataset collected through a global network of 3M+ verified data collectors across 150+ countries. Each recording captures natural, unscripted 2-3 person conversations in real-world environments — homes, cafes, markets, offices — not studio conditions. LANGUAGE COVERAGE: US English: Regional accent diversity (South, Midwest, Northeast, West Coast, NYC, Pacific NW, Southwest) Arabic: Egyptian, Gulf, Levantine, and Maghreb dialects across Egypt, Saudi Arabia, UAE, Jordan, Morocco European: French, German, Spanish, Portuguese Asian: Thai, Tagalog, Indonesian, Hindi African: Swahili, Yoruba, Amharic RECORDING SPECIFICATIONS: Duration: 10-30 minutes per conversation Format: WAV | 16-48 kHz sample rate Speakers: 2-3 per recording Type: Unscripted, natural conversations METADATA PER RECORDING: GPS-verified location (country, city, coordinates) Recording environment classification Device type and audio quality metrics (SNR) Conversation type and topic tags (daily life, food, family, work, shopping, storytelling, community) Per-speaker demographics: age range, gender, native language, education level LICENSING & COMPLIANCE: Explicit written consent from all participants Full commercial usage rights PII scrubbed from all metadata Auditable consent chain IDEAL FOR: ASR model training, speaker diarization, accent/dialect classification, conversational AI, emotion detection, language identification, and multilingual NLP. Scalable to custom volume, language, and demographic specifications on request.
提供机构:
Rwazi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作