the_office_lines
收藏魔搭社区2025-11-12 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/jxm/the_office_lines
下载链接
链接失效反馈官方服务:
资源简介:
## the_office_lines
<img src="https://a.pinatafarm.com/1351x1232/c8fa71efd1/the-office-handshake.jpg" width="256">
A dataset of lines from the U.S. version of the tv show "The Office". Lines were originally scraped from the website [officequotes.net](https://www.officequotes.net/), are fan-transcribed, and may be of dubious quality.
Contains a train split (47,927 lines), test split (5,991 lines) and validation split (5,991 lines). Contains lines from all 9 seasons, every episode, but may be complete.
Lines are annotated with an ID number, season number, episode number, scene number (within the episode), speaker name, and whether or not the text came from a deleted scene. Here is an example:
```
> dataset["val"][0]
{'id': 3735,
'season': 2,
'episode': 5,
'scene': 32,
'line_text': 'No, you have the power to undo it.',
'speaker': 'Creed',
'deleted': False}
```
# 《办公室》美版台词数据集(the_office_lines)

本数据集收录美剧《办公室》(美国版)的台词内容。原始数据由粉丝从[officequotes.net](https://www.officequotes.net/)爬取并转录,文本质量或存在参差不齐的情况。
数据集包含训练集(47,927条台词)、测试集(5,991条台词)与验证集(5,991条台词),涵盖全9季每一集的台词,但可能为完整收录的全量数据集。
每条台词均标注有编号、季数、集数、单集内场次编号、发言者姓名,以及该台词是否源自删减场景。示例如下:
> dataset["val"][0]
{
'id': 3735,
'season': 2,
'episode': 5,
'scene': 32,
'line_text': 'No, you have the power to undo it.',
'speaker': 'Creed',
'deleted': False
}
提供机构:
maas
创建时间:
2025-01-17



