asahi417/seamless-align-enA-jaA.tokenized.encodec
收藏Hugging Face2024-06-24 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/asahi417/seamless-align-enA-jaA.tokenized.encodec
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个子集,每个子集具有配置名称、特征和统计信息,如示例数量和字节数。每个子集包含行号、enA.id、enA.laser_score、jaA.id、jaA.laser_score以及英语和日语的音频令牌。数据集结构适用于训练目的,重点是音频和语言数据,可能用于翻译或语音识别任务。但是,提供的文档中没有明确描述数据集的目的或背景。
该数据集包含多个子集,每个子集具有配置名称、特征和统计信息,如示例数量和字节数。每个子集包含行号、enA.id、enA.laser_score、jaA.id、jaA.laser_score以及英语和日语的音频令牌。数据集结构适用于训练目的,重点是音频和语言数据,可能用于翻译或语音识别任务。但是,提供的文档中没有明确描述数据集的目的或背景。
提供机构:
asahi417
原始信息汇总
数据集概述
数据集子集信息
-
子集名称: subset_1
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 2073 examples, 815899569 bytes
- 下载大小: 123251574 bytes
- 数据集大小: 815899569 bytes
- 特征:
-
子集名称: subset_10
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1961 examples, 725774697 bytes
- 下载大小: 112592976 bytes
- 数据集大小: 725774697 bytes
- 特征:
-
子集名称: subset_100
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1757 examples, 706792003 bytes
- 下载大小: 110029180 bytes
- 数据集大小: 706792003 bytes
- 特征:
-
子集名称: subset_101
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1873 examples, 749981266 bytes
- 下载大小: 116751186 bytes
- 数据集大小: 749981266 bytes
- 特征:
-
子集名称: subset_102
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1868 examples, 771676360 bytes
- 下载大小: 120065239 bytes
- 数据集大小: 771676360 bytes
- 特征:
-
子集名称: subset_103
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1879 examples, 774430246 bytes
- 下载大小: 120566093 bytes
- 数据集大小: 774430246 bytes
- 特征:
-
子集名称: subset_104
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1901 examples, 769865245 bytes
- 下载大小: 119788722 bytes
- 数据集大小: 769865245 bytes
- 特征:
-
子集名称: subset_105
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1875 examples, 759539188 bytes
- 下载大小: 118068973 bytes
- 数据集大小: 759539188 bytes
- 特征:
-
子集名称: subset_106
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1880 examples, 763974790 bytes
- 下载大小: 118929460 bytes
- 数据集大小: 763974790 bytes
- 特征:
-
子集名称: subset_107
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1854 examples, 746027597 bytes
- 下载大小: 116110020 bytes
- 数据集大小: 746027597 bytes
- 特征:
-
子集名称: subset_108
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1834 examples, 759311040 bytes
- 下载大小: 118222328 bytes
- 数据集大小: 759311040 bytes
- 特征:
-
子集名称: subset_109
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1770 examples, 723605374 bytes
- 下载大小: 112557449 bytes
- 数据集大小: 723605374 bytes
- 特征:
-
子集名称: subset_11
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1779 examples, 655428175 bytes
- 下载大小: 101716009 bytes
- 数据集大小: 655428175 bytes
- 特征:
-
子集名称: subset_110
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1908 examples, 780899707 bytes
- 下载大小: 121507409 bytes
- 数据集大小: 780899707 bytes
- 特征:
-
子集名称: subset_111
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1877 examples, 764232585 bytes
- 下载大小: 118965916 bytes
- 数据集大小: 764232585 bytes
- 特征:
-
子集名称: subset_112
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1924 examples, 786541214 bytes
- 下载大小: 122390776 bytes
- 数据集大小: 786541214 bytes
- 特征:
-
子集名称: subset_114
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1940 examples, 794451880 bytes
- 下载大小: 123504504 bytes
- 数据集大小: 794451880 bytes
- 特征:
-
子集名称: subset_115
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1902 examples, 775732127 bytes
- 下载大小: 120677183 bytes
- 数据集大小: 775732127 bytes
- 特征:
-
子集名称: subset_116
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1910 examples, 789840942 bytes
- 下载大小: 122958974 bytes
- 数据集大小: 789840942 bytes
- 特征:
-
子集名称: subset_117
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1901 examples, 772429884 bytes
- 下载大小: 120284993 bytes
- 数据集大小: 772429884 bytes
- 特征:
-
子集名称: subset_118
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- enA.audio.tokens: sequence of int64
- jaA.audio.tokens: sequence of int64
- 分割:
- train: 1911 examples, 790151773 bytes
- 下载大小: 123064897 bytes
- 数据集大小: 790151773 bytes
- 特征:
-
子集名称: subset_119
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1867 examples, 768516369 bytes
- 下载大小: 119606209 bytes
- 数据集大小: 768516369 bytes
- 特征:
-
子集名称: subset_12
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- train: 1916 examples, 730035963 bytes
- 下载大小: 113236146 bytes
- 数据集大小: 730035963 bytes
- 特征:
-
子集名称: subset_120
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- jaA.id: string
- jaA.laser_score: float64
- jaA.audio.tokens: sequence of int64
- enA.audio.tokens: sequence of int64
- 分割:
- 特征:



