Serial Speakers: a Dataset of TV Series
收藏Figshare2020-02-17 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/TV_Series_Corpus/3471839/11
下载链接
链接失效反馈官方服务:
资源简介:
<b>Dataset of three TV Series</b> with <b>manual</b> annotations.<br>Cite as:@inproceedings{Bost2020,<br> title = {Serial Speakers: a Dataset of TV Series},<br> author = {Bost, Xavier and Labatut, Vincent and Linares, Georges},<br> url = {https://hal.archives-ouvertes.fr/hal-02477736},<br> booktitle = {12th International Conference on Language Resources and Evaluation (LREC 2020)},<br> address = {Marseille, France},<br> year = {2020}} <pre><br>The dataset consists of 3 TV series: <br></pre><br>- <i>Breaking Bad</i>: S01--S05 (file 'bb.json')<br>- <i>Game of Thrones</i>: S01--08 (file 'got.json')<br>- <i>House of Cards</i>: S01--S02 (file 'hoc.json')<br><br>All three files are in .json format and contain TV Series annotated data.<br><br>Each TV Series is defined by its <b>name</b>,<br><br>A TV Series contains <b>seasons</b>, defined by their <b>id</b>s.<br><br>Every season is made of <b>episodes</b>, defined by their <b><b>id</b></b>s,<b> title</b>s, <b>duration </b>and<b> fps </b>.<br><br>Each episode contains two basic kinds of <b>data</b>: <b>scenes</b> and <b>speech segments</b>.<br><br>Scenes are defined by <b>start</b>ing points and are made of <b>shots </b>(Seasons 1 only)<b>.<br><br></b>A shot is defined by<b>:<br><br></b>- <b>Start</b>ing and <b>end</b>ing positions.-<b> </b>Recurring shot <b>id</b>s.<br>The speech segments are defined by their:<br>- <b>Start</b>ing and <b>end</b>ing points.<br>- <b>Text</b>ual content (here encrypted for copyright reasons).<br>- <b>Speaker</b>.<br>- Possible<b> interlocutors</b> (for the following episodes only: <b>bb</b>: S01E04, S01E06, S02E03, S02E04; <b>got</b>: S01E03, S01E07, S01E08; <b>hoc</b>: S01E01, S01E07, S01E11).<br><br>All timestamps are expressed in seconds and are valid for the video files extracted from the commercial DVDs (PAL 25 FPS), with recaps (unannotated) included at the beginning of the <i>House of Cards</i> episodes.<br>In you are interested in the textual content of the dataset, please consider using our text recovering tool on GitHub:<br>https://github.com/bostxavier/Serial-Speakers<br>A comprehensive description of the dataset can be found at:<br>https://hal.archives-ouvertes.fr/hal-02477736<br><b> </b><br>
创建时间:
2020-02-08



