asahi417/seamless-align-enA-koA.tokenized
收藏Hugging Face2024-06-07 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/asahi417/seamless-align-enA-koA.tokenized
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个子集,每个子集包含行号、ID、激光分数以及英语(enA)和韩语(koA)的音频标记等特征。每个子集都有一个‘train’分割,并指定了字节大小和示例数量。数据集主要关注音频标记序列及其相关元数据。
The dataset consists of multiple subsets, each with the same data structure including line number, English and Korean IDs, LASER scores, and audio tokens. All subsets have only one training split, with varying sizes and number of examples in each subset.
提供机构:
asahi417
原始信息汇总
数据集概述
数据集子集信息
-
子集名称: subset_1
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 2246 examples, 896278188 bytes
- 下载大小: 135762610 bytes
- 数据集大小: 896278188 bytes
- 特征:
-
子集名称: subset_10
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- enA.audio.tokens: sequence(int64)
- koA.audio.tokens: sequence(int64)
- 分割:
- train: 1967 examples, 664755500 bytes
- 下载大小: 103095424 bytes
- 数据集大小: 664755500 bytes
- 特征:
-
子集名称: subset_11
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1859 examples, 614904782 bytes
- 下载大小: 95382847 bytes
- 数据集大小: 614904782 bytes
- 特征:
-
子集名称: subset_12
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1881 examples, 628177593 bytes
- 下载大小: 97547163 bytes
- 数据集大小: 628177593 bytes
- 特征:
-
子集名称: subset_13
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1963 examples, 667716027 bytes
- 下载大小: 103665506 bytes
- 数据集大小: 667716027 bytes
- 特征:
-
子集名称: subset_14
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- enA.audio.tokens: sequence(int64)
- koA.audio.tokens: sequence(int64)
- 分割:
- train: 1924 examples, 638960181 bytes
- 下载大小: 99302966 bytes
- 数据集大小: 638960181 bytes
- 特征:
-
子集名称: subset_15
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1928 examples, 654097354 bytes
- 下载大小: 101603506 bytes
- 数据集大小: 654097354 bytes
- 特征:
-
子集名称: subset_16
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- enA.audio.tokens: sequence(int64)
- koA.audio.tokens: sequence(int64)
- 分割:
- train: 1844 examples, 608816866 bytes
- 下载大小: 94610300 bytes
- 数据集大小: 608816866 bytes
- 特征:
-
子集名称: subset_17
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1947 examples, 654173438 bytes
- 下载大小: 101713307 bytes
- 数据集大小: 654173438 bytes
- 特征:
-
子集名称: subset_18
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1903 examples, 631032489 bytes
- 下载大小: 98052921 bytes
- 数据集大小: 631032489 bytes
- 特征:
-
子集名称: subset_19
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1936 examples, 652237917 bytes
- 下载大小: 101464580 bytes
- 数据集大小: 652237917 bytes
- 特征:
-
子集名称: subset_2
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 2190 examples, 862445739 bytes
- 下载大小: 131249944 bytes
- 数据集大小: 862445739 bytes
- 特征:
-
子集名称: subset_20
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1938 examples, 636361987 bytes
- 下载大小: 98973260 bytes
- 数据集大小: 636361987 bytes
- 特征:
-
子集名称: subset_21
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1927 examples, 645970558 bytes
- 下载大小: 100411389 bytes
- 数据集大小: 645970558 bytes
- 特征:
-
子集名称: subset_22
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1917 examples, 640550517 bytes
- 下载大小: 99640948 bytes
- 数据集大小: 640550517 bytes
- 特征:
-
子集名称: subset_23
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1913 examples, 637838380 bytes
- 下载大小: 99169379 bytes
- 数据集大小: 637838380 bytes
- 特征:
-
子集名称: subset_24
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1928 examples, 648787138 bytes
- 下载大小: 100980345 bytes
- 数据集大小: 648787138 bytes
- 特征:
-
子集名称: subset_25
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1931 examples, 652463038 bytes
- 下载大小: 101501446 bytes
- 数据集大小: 652463038 bytes
- 特征:
-
子集名称: subset_26
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- enA.audio.tokens: sequence(int64)
- koA.audio.tokens: sequence(int64)
- 分割:
- train: 1934 examples, 651441317 bytes
- 下载大小: 101396088 bytes
- 数据集大小: 651441317 bytes
- 特征:
-
子集名称: subset_27
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- enA.audio.tokens: sequence(int64)
- koA.audio.tokens: sequence(int64)
- 分割:
- train: 1902 examples, 647445880 bytes
- 下载大小: 100647805 bytes
- 数据集大小: 647445880 bytes
- 特征:
-
子集名称: subset_28
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1924 examples, 646014873 bytes
- 下载大小: 100523143 bytes
- 数据集大小: 646014873 bytes
- 特征:
-
子集名称: subset_29
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.audio.tokens: sequence(int64)
- 分割:
- train: 1916 examples, 646891300 bytes
- 下载大小: 100611535 bytes
- 数据集大小: 646891300 bytes
- 特征:
-
子集名称: subset_3
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- enA.audio.tokens: sequence(int64)
- koA.audio.tokens: sequence(int64)
- 分割:
- train: 2056 examples, 780543841 bytes
- 下载大小: 119202416 bytes
- 数据集大小: 780543841 bytes
- 特征:
-
子集名称: subset_30
- 特征:
- line_no: int64
- enA.id: string
- enA.laser_score: float64
- koA.id: string
- koA.laser_score: float64
- koA.audio.tokens: sequence(int64)
- enA.
- 特征:



