refine-ai/subscene
收藏Hugging Face2026-04-20 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/refine-ai/subscene
下载链接
链接失效反馈官方服务:
资源简介:
Subscene是一个包含超过30亿个token的多语言字幕集合,包括电影、电视剧和动画的字幕。该数据集支持65种不同的语言,数据量为410.70 GB。数据集包含文本生成、翻译和文本分类等任务类别,数据规模在1B到10B之间。
Subscene is a vast collection of multilingual subtitles, encompassing 65 different languages and consisting of more than 30 billion tokens with a total size of 410.70 GB. This dataset includes subtitles for movies, series, and animations gathered from the Subscene dump. It provides a rich resource for studying language variations and building multilingual NLP models.
提供机构:
refine-ai



