European Smaller-Language Video Subtitle Sample Set
收藏DataCite Commons2026-05-13 更新2026-05-17 收录
下载链接:
https://b2share.eudat.eu/doi/10.23728/b2share.btyks-0y657
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains self-authored subtitle samples for six smaller European languages: Welsh, Irish, Catalan, Basque, Maltese, and Icelandic. The release is structured for repository deposit and multilingual video-localization evaluation, with 144 clip-level records, 432 aligned subtitle segments, 288 SRT files, a manifest, field dictionary, methodology notes, and a machine-readable schema. All distributed text was authored specifically for this release. No source video, source audio, scraped subtitles, or third-party transcripts are included.
The package is intended for subtitle alignment testing, localization workflow review, multilingual file-ingestion checks, and documentation of repository-ready dataset packaging for audiovisual translation scenarios. The distributed files support both human inspection and machine processing: SRT files provide subtitle-like timing structure, while the CSV and JSON files expose clip identifiers, segment alignment, language coverage, and rights metadata in a form suitable for downstream parsing or indexing.
This repository record distributes the dataset files directly. The linked website is provided only as supplementary project context for subtitle translation and multilingual file-processing workflows.
提供机构:
B2SHARE
创建时间:
2026-05-13



