Customer Service Voice Data Stream | High-Emotion Audio Dataset for LLM & NLP Training (US Accents)
收藏Databricks2026-02-13 收录
下载链接:
https://marketplace.databricks.com/details/6ca5361b-cb0e-4682-9d83-d7b73ac0eabb/WiserBrand-com_Customer-Service-Voice-Data-Stream-High-Emotion-Audio-Dataset-for-LLM-&-NLP-Training-(US-Accents)
下载链接
链接失效反馈官方服务:
资源简介:
We provide Real-World Customer Support Voice Data. This dataset captures the "long-tail" of human interactions: frustration, urgency, interruptions, and diverse acoustic conditions essential for building robust AI models.
Why This Data is Unique?
Most datasets represent the "Happy Path" (calm, polite speech). Our data solves the Class Imbalance problem by providing high-friction interactions.
High Emotional Variance: Authentic anger, sarcasm, and relief — critical for training Sentiment Analysis and Empathetic AI.
Real-World Acoustics: Background noise, phone line compression, and crosstalk (interruptions).
Diverse Demographics: 90% unique speakers featuring a wide range of US dialects and accents.
Dataset Specifications:
Volume: We capture over 1,600 hours of new audio daily.
Format: .wav, .flac Audio files paired with segmented, time-stamped transcriptions.
Metadata: Rich labeling including Call Reason (Intent), Location, OS, Duration, and Speaker Turns.
Privacy: All PII is redacted via automated and human-in-the-loop processes.
Perfect For Training:
LLMs & NLP: Fine-tuning models on conversational logic and intent recognition.
ASR (Speech-to-Text): Improving accuracy on "messy" audio and diverse accents.
Voicebots: Teaching agents to handle objections and emotional escalations.
Customer Intelligence: Churn prediction and conflict resolution analytics.
Data Origin:
100% sourced from real US consumers contacting customer support. Not synthetic, not scripted.
提供机构:
WiserBrand.com



