five

Lenylvt/opensubtitles-org

收藏
Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lenylvt/opensubtitles-org
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 language: - multilingual tags: - subtitles - opensubtitles - nlp pretty_name: OpenSubtitles.org Dataset size_categories: - 1M<n<10M task_categories: - text-generation --- # OpenSubtitles.org Dataset Subtitle files scraped from [OpenSubtitles.org](https://www.opensubtitles.org), covering **all available languages**. ## Dataset Structure Each row represents one subtitle file extracted from the OpenSubtitles archive. | Column | Type | Description | |---|---|---| | `subtitle_id` | int64 | OpenSubtitles numeric ID | | `title` | string | Movie or show title (parsed from filename) | | `year` | int32 | Release year (null if unknown) | | `language` | string | ISO 639-2 3-letter code (`eng`, `fra`, `spa`, …) | | `language_name` | string | Human-readable language name | | `cd_count` | int32 | Number of CDs (1cd, 2cd, …) | | `is_tv_show` | bool | True if season/episode detected | | `season` | int32 | Season number (TV shows only) | | `episode` | int32 | Episode number (TV shows only) | | `upload_date` | string | Upload date (YYYY-MM-DD, when available) | | `imdb_id` | int64 | IMDB title ID (when available) | | `zip_filename` | string | Original zip filename | | `subtitle_file` | string | Filename inside the zip | | `subtitle_format` | string | Format: `srt`, `sub`, `ass`, `ssa`, `vtt`, … | | `encoding` | string | Detected character encoding before UTF-8 conversion | | `content` | string | Full subtitle text content (UTF-8) | ## Usage ```python from datasets import load_dataset ds = load_dataset("Lenylvt/opensubtitles-org") # Filter by language english = ds["train"].filter(lambda x: x["language"] == "eng") # Filter by format srt_only = ds["train"].filter(lambda x: x["subtitle_format"] == "srt") ``` ## Source Scraped from OpenSubtitles.org using [opensubtitles-scraper](https://github.com/milahu/opensubtitles-scraper).
提供机构:
Lenylvt
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作