ACN - Premium Spoken Word Conversational Audio Data (515K hours OTS, 122 languages, ...
收藏Databricks2026-05-12 收录
下载链接:
https://marketplace.databricks.com/details/55507220-68bb-4308-adee-7636a47d8561/ACNetwork_ACN---Premium-Spoken-Word-Conversational-Audio-Data-(515K-hours-OTS,-122-languages,-
下载链接
链接失效反馈官方服务:
资源简介:
ACN's Spoken Word Audio Dataset delivers 515K hours of immediately available, rights-cleared premium conversational audio across 120+ languages and 14+ language families with the full 2.8 million hour catalog available under enterprise agreement. This is not raw audio. Every premium audio asset is delivered as a fully enriched, structured package ready for immediate pipeline ingestion.
What's Delivered Per Asset
- Audio Files: Mixed-down master (MP3/WAV/FLAC · 44.1kHz–48kHz · 16-bit/24-bit · stereo and mono), vocals.wav (vocal separation stem), accompaniment.wav (accompaniment/background separation), and individual speaker isolation stems (speaker_0.wav, speaker_1.wav... per detected speaker — typically 2–5 per asset).
- 8 Structured JSON Enrichment Files:
-- manifest.json — asset ID, language ISO, duration, format specs, file sizes, ingest batch ID, provenance checksum
-- metadata.json — domain classification, topic tags, content vertical, show/episode metadata, speaker count estimate
-- transcription.json — full verbatim transcript with word-level timestamps and confidence scores, speaker diarization labels, disfluencies and fillers retained, cross-talk flagged
-- audio-quality.json — SNR estimate (dB), VAD speech ratio, effective bandwidth (Hz), overlap detection ratio, per-asset quality flags
-- content-class.json — content type classification: interview, debate, monologue, panel discussion, live commentary, narrative
-- sentiment.json — utterance-level sentiment scoring (positive/negative/neutral) across the full asset
-- topic.json — topic classification, keyword extraction with mention counts, domain confidence scores
-- summary.json — asset-level natural language summary and extracted key points
JSONL Batch Manifest: Every asset mapped in a structured batch manifest for pipeline-level ingestion — fields consistent across all 133 languages, no reformatting required.
Language Coverage
133 languages across 14+ language families:
- Indo-European (Germanic) 374,691 hrs — English (US/UK/AU/ZA/KE/NG/IN/SG), German, Swedish, Norwegian, Dutch, Danish, Afrikaans, Icelandic, Nynorsk, Yiddish
- Indo-European (Indo-Aryan) 446,248 hrs — Hindi, Bengali, Punjabi, Gujarati, Marathi, Urdu, Nepali, Sindhi, Assamese, Sinhala
- Indo-European (Romance) 311,246 hrs — Spanish, Portuguese, French, Italian, Romanian, Catalan, Galician
- Indo-European (Slavic) 213,783 hrs — Russian, Polish, Ukrainian, Czech, Slovak, Bulgarian, Serbian, Croatian, Bosnian, Macedonian, Slovenian, Belarusian
- Indo-European (Indo-Iranian) 81,320 hrs — Persian, Pashto, Tajik
- Afro-Asiatic 254,727 hrs — Arabic (multi-dialect), Hebrew, Hausa, Amharic, Maltese
- Sino-Tibetan 128,046 hrs — Mandarin Chinese, Burmese
- Dravidian 157,078 hrs — Kannada, Tamil, Telugu, Malayalam
- Austronesian 109,812 hrs — Indonesian, Malay, Tagalog, Javanese, Hawaiian
- Turkic 55,078 hrs — Turkish, Kazakh, Uzbek, Kyrgyz, Azerbaijani
- Japonic 79,354 hrs — Japanese
- Koreanic 59,383 hrs — Korean
- Niger-Congo 62,000 hrs — Swahili, Zulu
- Austroasiatic 32,009 hrs — Vietnamese, Khmer
- Other (Uralic, Tai-Kadai, Kartvelian, Mongolic, Isolate) — Finnish, Hungarian, Estonian, Thai, Lao, Georgian, Mongolian, Basque
Source & Content
All audio sourced from ACN's owned-and-operated podcast network and partner creator catalog across six editorial verticals: Sports & College Football, Culture & Entertainment, Business & Finance, News & Politics, Comedy, and True Crime. Long-form, natural, multi-speaker conversational dialogue- not read speech, not scripted, not synthetic. Average asset length 45+ minutes. Typical 2–5 speakers per asset.
Rights & Availability
515,849 hours OTS, deliverable within 30 days or less. Full 2,887,393 hour catalog available under enterprise agreement. Explicit AI training license provided for every asset. Perpetual license available. Larger samples under NDA.
USE CASES
- ASR model training and fine-tuning across 133 languages, dialects, and accent variants.
- Neural TTS and speech synthesis training across 14+ language families with natural prosody variation.
- Voice cloning and speaker adaptation using isolated per-speaker stems across all languages.
- Speaker diarization and voice biometrics model development using pre-labeled speaker stems.
- Conversational AI and dialogue model training with authentic turn-taking, disfluencies, and cross-talk.
提供机构:
ACNetwork



