Nan-Do/OpenSubtitlesJapanese
收藏Hugging Face2023-04-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Nan-Do/OpenSubtitlesJapanese
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
language:
- ja
tags:
- text
- tv
- movies
pretty_name: OpenSubtitles dataset in Japanese
size_categories:
- 100M<n<1B
---
The dataset contains (almost) the entire OpenSubtittles database for Japanese:
- Over 7000 tv shows and/or movies.
- The subtittles are human generated.
- The dataset has been parsed, cleaned and converted to UTF-8.
File contents:
- OpenSubtitles.parquet: The text and the time data.
- OpenSubtitles_meta.parquet: The existing metadata for each title.
- OpenSubtitles-OA.parquet: The dataset coded with two columns SOURCE(the name of the movie/tv show), and TEXT (the subtittles) following the Open Assistant rules.
Both tables can be joined by the ID column. (The value can be NULL in the meta table).
提供机构:
Nan-Do
原始信息汇总
数据集概述
基本信息
- 任务类别:文本生成
- 语言:日语
- 标签:文本、电视、电影
- 美观名称:日语版OpenSubtitles数据集
- 大小类别:1亿<n<10亿
数据集内容
- 包含内容:
- 超过7000部电视节目和/或电影的日语字幕
- 字幕为人为生成
- 数据已解析、清洗并转换为UTF-8编码
文件详情
- OpenSubtitles.parquet:包含文本和时间数据
- OpenSubtitles_meta.parquet:包含每个标题的现有元数据
- OpenSubtitles-OA.parquet:按照Open Assistant规则编码的数据集,包含两列:SOURCE(电影/电视节目名称)和TEXT(字幕)
数据关联
- 关联方式:通过ID列将两个表连接,元数据表中的ID值可能为NULL



