five

Serial Speakers: a TV Series Dataset

收藏
DataCite Commons2020-09-04 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/dataset/TV_Series_Corpus/3471839/9
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Dataset of three TV Series</b> with <b>manual</b> annotations:<br><br>- <i>Breaking Bad</i>: S01--S05 (file 'bb.json')<br>- <i>Game of Thrones</i>: S01--07 (file 'got.json')<br>- <i>House of Cards</i>: S01--S02 (file 'hoc.json')<br><br>All three files are in .json format and contain TV Series annotated data.<br><br>Each TV Series is defined by its <b>name</b>,<br><br>A TV Series contains <b>seasons</b>, defined by their <b>id</b>s.<br><br>Every season is made of <b>episodes</b>, defined by their <b><b>id</b></b>s,<b> title</b>s, <b>duration </b>and<b> fps </b>.<br><br>Each episode contains two basic kinds of <b>data</b>: <b>scenes</b> and <b>speech segments</b>.<br><br>Scenes are defined by <b>start</b>ing points and are made of <b>shots </b>(Seasons 1 only)<b>.<br><br></b>A shot is defined by<b>:<br><br></b>- <b>Start</b>ing and <b>end</b>ing positions.-<b> </b>Recurring shot <b>id</b>s.<br>The speech segments are defined by their:<br>- <b>Start</b>ing and <b>end</b>ing points.<br>- <b>Text</b>ual content (here encrypted for copyright reasons).<br>- <b>Speaker</b>.<br>- Possible<b> interlocutors</b> (for the following episodes only: <b>bb</b>: S01E04, S01E06, S02E03, S02E04; <b>got</b>: S01E03, S01E07, S01E08; <b>hoc</b>: S01E01, S01E07, S01E11).<br><br>All timestamps are expressed in seconds and are valid for the video files extracted from the commercial DVDs (PAL 25 FPS), with recaps (unannotated) included at the beginning of the <i>House of Cards</i> episodes.<br>In you are interested in the textual content of the dataset, please consider using our text recovering tool on GitHub:<br>https://github.com/bostxavier/Serial-Speakers<br><b> </b><br>
提供机构:
figshare
创建时间:
2019-12-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作