Babbl Labs YouTube Transcription Database | YouTube Video Transcript Firehose | EN/US | 25K+ ...
收藏Databricks2026-04-09 收录
下载链接:
https://marketplace.databricks.com/details/2bdce24d-9dad-4814-8808-a0961c058be1/Babbl_Babbl-Labs-YouTube-Transcription-Database-YouTube-Video-Transcript-Firehose-EN/US-25K+-
下载链接
链接失效反馈官方服务:
资源简介:
TranscriptDB is the full YouTube transcript firehose, built for systematic hedge funds with in-house NLP capacity and AI labs training foundation models. Smaller buyers should look at Tripwire instead.
What you get:
- 25,000+ pre-filtered market-relevant YouTube channels
- 1M+ videos processed monthly
- Intraday delivery (2-3 hour clip latency)
- Named entities, speaker diarization, sentiment scoring on every transcript
- 5+ years of historical archive available as a separate backfile
Use cases:
- Quantitative alpha generation and signal research
- Foundation model training and fine-tuning corpora
- Long-form NLP backtesting and event studies
- Cross-source entity disambiguation at scale
提供机构:
Babbl



