Lenylvt/opensubtitles-org
收藏Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lenylvt/opensubtitles-org
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
language:
- multilingual
tags:
- subtitles
- opensubtitles
- nlp
pretty_name: OpenSubtitles.org Dataset
size_categories:
- 1M<n<10M
task_categories:
- text-generation
---
# OpenSubtitles.org Dataset
Subtitle files scraped from [OpenSubtitles.org](https://www.opensubtitles.org), covering **all available languages**.
## Dataset Structure
Each row represents one subtitle file extracted from the OpenSubtitles archive.
| Column | Type | Description |
|---|---|---|
| `subtitle_id` | int64 | OpenSubtitles numeric ID |
| `title` | string | Movie or show title (parsed from filename) |
| `year` | int32 | Release year (null if unknown) |
| `language` | string | ISO 639-2 3-letter code (`eng`, `fra`, `spa`, …) |
| `language_name` | string | Human-readable language name |
| `cd_count` | int32 | Number of CDs (1cd, 2cd, …) |
| `is_tv_show` | bool | True if season/episode detected |
| `season` | int32 | Season number (TV shows only) |
| `episode` | int32 | Episode number (TV shows only) |
| `upload_date` | string | Upload date (YYYY-MM-DD, when available) |
| `imdb_id` | int64 | IMDB title ID (when available) |
| `zip_filename` | string | Original zip filename |
| `subtitle_file` | string | Filename inside the zip |
| `subtitle_format` | string | Format: `srt`, `sub`, `ass`, `ssa`, `vtt`, … |
| `encoding` | string | Detected character encoding before UTF-8 conversion |
| `content` | string | Full subtitle text content (UTF-8) |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("Lenylvt/opensubtitles-org")
# Filter by language
english = ds["train"].filter(lambda x: x["language"] == "eng")
# Filter by format
srt_only = ds["train"].filter(lambda x: x["subtitle_format"] == "srt")
```
## Source
Scraped from OpenSubtitles.org using [opensubtitles-scraper](https://github.com/milahu/opensubtitles-scraper).
提供机构:
Lenylvt



