five

ACN - Real-Time US English Conversational Audio Feed (1,000 hrs/month, time-based + API ...

收藏
Databricks2026-05-12 收录
下载链接:
https://marketplace.databricks.com/details/3a90b516-bad9-47b3-9bb7-1c3010cc52e3/ACNetwork_ACN---Real-Time-US-English-Conversational-Audio-Feed-(1,000-hrs/month,-time-based-+-API-
下载链接
链接失效反馈
官方服务:
资源简介:
ACN's Real-Time US English Conversational Audio Feed delivers 1,000 hours of newly produced, rights-cleared US English conversational audio every month via API -- giving AI teams a continuous, structured stream of fresh training data to keep models current, reduce temporal drift, and expand training corpora incrementally without repeated bulk licensing cycles. This product is available as an add-on to an ACN static catalog license. Every asset delivered through the feed carries the full ACN enrichment stack -- the same 8 structured JSON files, isolated speaker stems, vocal and accompaniment separation, and JSONL manifest delivered with the static catalog -- meaning the feed integrates directly into existing ACN-based training pipelines with no additional preprocessing or schema changes required. Why Real-Time Conversational Data Static training corpora capture language as it existed at the time of collection. Conversational English evolves continuously -- new vocabulary, emerging topics, shifting registers, evolving cultural references, and changing speaker demographics all affect how well a model generalizes to real-world use. A continuous monthly feed of freshly produced conversational audio gives training teams a principled mechanism to address temporal drift, expand domain and topic coverage incrementally, and maintain model freshness between major retraining cycles. Feed Specifications Volume: 1,000 hours of new audio produced and delivered per calendar month. Source: ACN's active owned-and-operated US English podcast network and partner creator catalog, produced in the current calendar month of delivery. Content: Natural long-form multi-speaker conversational dialogue across ACN's six editorial verticals -- Sports and College Football, Culture and Entertainment, Business and Finance, News and Politics, Comedy, and True Crime. Unscripted, authentic speaker interaction. Average asset length 45+ minutes. Typical 2 to 5 speakers per asset. Language: US English (en_US). American-accented, rhotic, stress-timed. Full range of regional American sub-dialect variation across the active speaker pool. Delivery: API access. New assets available via API as they complete the enrichment pipeline, typically within 72 hours of original publication. Buyers pull assets on their own cadence -- no push delivery required. Format: MP3, WAV, FLAC at 44,100 Hz or 48,000 Hz, 16-bit or 24-bit, stereo and mono. What's Delivered Per Asset Audio Files: Mixed-down master (MP3/WAV/FLAC). vocals.wav (vocal separation stem). accompaniment.wav (accompaniment/background separation). speaker_00.wav, speaker_01.wav... (individual speaker isolation stems per detected speaker, typically 2 to 5 per asset). 8 Structured JSON Enrichment Files: - manifest.json -- asset ID, language ISO (en_US), publication date, duration, format specs, ingest batch, provenance - metadata.json -- domain, topic tags, vertical, show/episode metadata, speaker count, production month - transcription.json -- full verbatim transcript with word-level timestamps, speaker diarization labels and confidence scores; disfluencies and fillers retained - audio-quality.json -- SNR estimate, VAD speech ratio, effective bandwidth, overlap detection ratio, quality flags - content-class.json -- content type: interview, debate, panel, monologue, live commentary, narrative - sentiment.json -- utterance-level sentiment scoring across the full asset - topic.json -- topic classification, keywords, mention counts - summary.json -- asset-level summary and key points JSONL Feed Manifest: all assets delivered in a rolling JSONL manifest updated with each new batch. Production month and publication date included as top-level fields for temporal filtering. Schema identical to ACN static catalog JSONL manifest -- no pipeline reconfiguration required. Integration with Static Catalog This feed is an add-on to an ACN existing OTS catalog license. The enrichment schema, file structure, asset directory layout, and JSONL manifest format are identical across the static catalog and the real-time feed. Teams already ingesting ACN static catalog assets can point their existing pipeline at the API endpoint and begin consuming feed assets immediately. No schema mapping, no format conversion, no additional preprocessing. API + Time-Based Access Assets are available via API endpoint as they clear the enrichment pipeline, typically within 72 hours of original publication date. Buyers may query by production month, vertical, content type, publication date range, and asset ID. Full asset directory -- audio files, stems, all 8 JSON files -- available per asset via API. Usage and volume reporting available via API dashboard. Other time-based schedules are available, including daily, weekly, and monthly. Rights and Provenance Every asset carries an explicit AI training license covering the month of delivery and all subsequent use. ACN owns or holds direct licensing agreements with all content creators in the active feed catalog. No fair use assumptions, no scraped content. Full ownership chain documentation available per asset via manifest.json. Perpetual AI training license on all delivered assets. Availability Available as an add-on to a new or existing ACN static catalog license. Feed activation within 30 days of agreement execution. First monthly batch delivered upon activation. Samples of feed-format assets available under NDA prior to licensing. USE CASES - Continuous training data refresh to reduce temporal drift in deployed US English ASR and conversational AI models - Incremental corpus expansion for teams on active training cycles who need a predictable monthly volume of new rights-cleared data - Topic and domain freshness -- 1,000 hours of newly produced content each month captures emerging vocabulary, current events, and evolving conversational registers across 6 verticals - Fine-tuning pipeline support -- structured monthly batches with consistent schema enable automated fine-tuning workflows keyed to production month - Longitudinal linguistic research -- month-over-month feed archive builds a temporally indexed corpus of US English conversational speech for language change and drift analysis
提供机构:
ACNetwork
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作