tvarchive Dataset

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/5068195

下载链接

链接失效反馈

官方服务：

资源简介：

The tvarchive dataset contains word-frequency and other non-consumptive-use data about 1,205,844 English-language transcriptions of U.S. television news broadcasts. The documents were scraped from the Internet Archive's TV News Archive, which includes automatic captions of select U.S. news broadcasts since 2009. While the complete TV News Archive contains over 2.2 million transcripts, WE1S researchers were only able to collect about 1.2 million documents containing complete transcripts. The full TV News Archive includes transcripts from 33 networks and hundreds of shows. Unlike other WE1S datasets, the tvarchive dataset was not collected using keyword searches for specific terms (i.e., documents containing the word "humanities"). (See WE1S Research Materials Overview for the relation between the project's "datasets" and "collections.")

创建时间：

2021-07-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集