five

ACN - Premium Spoken Word Conversational Audio Data (515K hours OTS, 122 languages, ...

收藏
Databricks2026-05-12 收录
下载链接:
https://marketplace.databricks.com/details/55507220-68bb-4308-adee-7636a47d8561/ACNetwork_ACN---Premium-Spoken-Word-Conversational-Audio-Data-(515K-hours-OTS,-122-languages,-
下载链接
链接失效反馈
官方服务:
资源简介:
ACN's Spoken Word Audio Dataset delivers 515K hours of immediately available, rights-cleared premium conversational audio across 120+ languages and 14+ language families with the full 2.8 million hour catalog available under enterprise agreement. This is not raw audio. Every premium audio asset is delivered as a fully enriched, structured package ready for immediate pipeline ingestion. What's Delivered Per Asset - Audio Files: Mixed-down master (MP3/WAV/FLAC · 44.1kHz–48kHz · 16-bit/24-bit · stereo and mono), vocals.wav (vocal separation stem), accompaniment.wav (accompaniment/background separation), and individual speaker isolation stems (speaker_0.wav, speaker_1.wav... per detected speaker — typically 2–5 per asset). - 8 Structured JSON Enrichment Files: -- manifest.json — asset ID, language ISO, duration, format specs, file sizes, ingest batch ID, provenance checksum -- metadata.json — domain classification, topic tags, content vertical, show/episode metadata, speaker count estimate -- transcription.json — full verbatim transcript with word-level timestamps and confidence scores, speaker diarization labels, disfluencies and fillers retained, cross-talk flagged -- audio-quality.json — SNR estimate (dB), VAD speech ratio, effective bandwidth (Hz), overlap detection ratio, per-asset quality flags -- content-class.json — content type classification: interview, debate, monologue, panel discussion, live commentary, narrative -- sentiment.json — utterance-level sentiment scoring (positive/negative/neutral) across the full asset -- topic.json — topic classification, keyword extraction with mention counts, domain confidence scores -- summary.json — asset-level natural language summary and extracted key points JSONL Batch Manifest: Every asset mapped in a structured batch manifest for pipeline-level ingestion — fields consistent across all 133 languages, no reformatting required. Language Coverage 133 languages across 14+ language families: - Indo-European (Germanic) 374,691 hrs — English (US/UK/AU/ZA/KE/NG/IN/SG), German, Swedish, Norwegian, Dutch, Danish, Afrikaans, Icelandic, Nynorsk, Yiddish - Indo-European (Indo-Aryan) 446,248 hrs — Hindi, Bengali, Punjabi, Gujarati, Marathi, Urdu, Nepali, Sindhi, Assamese, Sinhala - Indo-European (Romance) 311,246 hrs — Spanish, Portuguese, French, Italian, Romanian, Catalan, Galician - Indo-European (Slavic) 213,783 hrs — Russian, Polish, Ukrainian, Czech, Slovak, Bulgarian, Serbian, Croatian, Bosnian, Macedonian, Slovenian, Belarusian - Indo-European (Indo-Iranian) 81,320 hrs — Persian, Pashto, Tajik - Afro-Asiatic 254,727 hrs — Arabic (multi-dialect), Hebrew, Hausa, Amharic, Maltese - Sino-Tibetan 128,046 hrs — Mandarin Chinese, Burmese - Dravidian 157,078 hrs — Kannada, Tamil, Telugu, Malayalam - Austronesian 109,812 hrs — Indonesian, Malay, Tagalog, Javanese, Hawaiian - Turkic 55,078 hrs — Turkish, Kazakh, Uzbek, Kyrgyz, Azerbaijani - Japonic 79,354 hrs — Japanese - Koreanic 59,383 hrs — Korean - Niger-Congo 62,000 hrs — Swahili, Zulu - Austroasiatic 32,009 hrs — Vietnamese, Khmer - Other (Uralic, Tai-Kadai, Kartvelian, Mongolic, Isolate) — Finnish, Hungarian, Estonian, Thai, Lao, Georgian, Mongolian, Basque Source & Content All audio sourced from ACN's owned-and-operated podcast network and partner creator catalog across six editorial verticals: Sports & College Football, Culture & Entertainment, Business & Finance, News & Politics, Comedy, and True Crime. Long-form, natural, multi-speaker conversational dialogue- not read speech, not scripted, not synthetic. Average asset length 45+ minutes. Typical 2–5 speakers per asset. Rights & Availability 515,849 hours OTS, deliverable within 30 days or less. Full 2,887,393 hour catalog available under enterprise agreement. Explicit AI training license provided for every asset. Perpetual license available. Larger samples under NDA. USE CASES - ASR model training and fine-tuning across 133 languages, dialects, and accent variants. - Neural TTS and speech synthesis training across 14+ language families with natural prosody variation. - Voice cloning and speaker adaptation using isolated per-speaker stems across all languages. - Speaker diarization and voice biometrics model development using pre-labeled speaker stems. - Conversational AI and dialogue model training with authentic turn-taking, disfluencies, and cross-talk.
提供机构:
ACNetwork
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作