speechocean762 in WebDataset Format
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14725290
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is the speechocean762 dataset, formatted in the WebDataset format. WebDataset files are essentially tar archives, where each example in the dataset is represented by a pair of files: a WAV audio file and a corresponding JSON metadata file. The JSON file contains the class label and other relevant information for that particular audio sample.
$ tar tvf speechocean762_train.tar |head
-r--r--r-- bigdata/bigdata 607 2025-01-13 14:49 000010011.json
-r--r--r-- bigdata/bigdata 82604 2025-01-13 14:49 000010011.wav
-r--r--r-- bigdata/bigdata 646 2025-01-13 14:49 000010035.json
-r--r--r-- bigdata/bigdata 109804 2025-01-13 14:49 000010035.wav
-r--r--r-- bigdata/bigdata 630 2025-01-13 14:49 000010053.json
-r--r--r-- bigdata/bigdata 107244 2025-01-13 14:49 000010053.wav
-r--r--r-- bigdata/bigdata 561 2025-01-13 14:49 000010063.json
-r--r--r-- bigdata/bigdata 106252 2025-01-13 14:49 000010063.wav
-r--r--r-- bigdata/bigdata 671 2025-01-13 14:49 000010069.json
-r--r--r-- bigdata/bigdata 96364 2025-01-13 14:49 000010069.wav
$ cat 000010011.json
{"id": 10011, "accuracy": 8, "completeness": 10.0, "fluency": 9, "prosodic": 9, "words": [{"accuracy": 10, "stress": 10, "phones": ["W", "IY0"], "total": 10, "text": "WE", "phones-accuracy": [2.0, 2.0]}, {"accuracy": 10, "stress": 10, "phones": ["K", "AO0", "L"], "total": 10, "text": "CALL", "phones-accuracy": [2.0, 1.8, 1.8]}, {"accuracy": 10, "stress": 10, "phones": ["IH0", "T"], "total": 10, "text": "IT", "phones-accuracy": [2.0, 2.0]}, {"accuracy": 6, "stress": 10, "phones": ["B", "EH0", "R"], "total": 6, "text": "BEAR", "phones-accuracy": [2.0, 1.0, 1.0]}], "total": 8, "text": "WE CALL IT BEAR"}
本数据集为speechocean762数据集(speechocean762 dataset),采用WebDataset(WebDataset)格式进行组织。WebDataset文件本质上为tar归档文件,数据集中的每一条样本均由一组配套文件构成:一条为WAV(WAV)格式音频文件,另一条为对应的JSON元数据文件。该JSON文件包含对应音频样本的类别标签及其他相关信息。
执行命令`tar tvf speechocean762_train.tar | head`可查看训练集归档文件的前若干条样本,输出示例如下:
-r--r--r-- bigdata/bigdata 607 2025-01-13 14:49 000010011.json
-r--r--r-- bigdata/bigdata 82604 2025-01-13 14:49 000010011.wav
-r--r--r-- bigdata/bigdata 646 2025-01-13 14:49 000010035.json
-r--r--r-- bigdata/bigdata 109804 2025-01-13 14:49 000010035.wav
-r--r--r-- bigdata/bigdata 630 2025-01-13 14:49 000010053.json
-r--r--r-- bigdata/bigdata 107244 2025-01-13 14:49 000010053.wav
-r--r--r-- bigdata/bigdata 561 2025-01-13 14:49 000010063.json
-r--r--r-- bigdata/bigdata 106252 2025-01-13 14:49 000010063.wav
-r--r--r-- bigdata/bigdata 671 2025-01-13 14:49 000010069.json
-r--r--r-- bigdata/bigdata 96364 2025-01-13 14:49 000010069.wav
通过`cat 000010011.json`可查看该样本对应的元数据文件,其内容示例如下:
{"id": 10011, "accuracy": 8, "completeness": 10.0, "fluency": 9, "prosodic": 9, "words": [{"accuracy": 10, "stress": 10, "phones": ["W", "IY0"], "total": 10, "text": "WE", "phones-accuracy": [2.0, 2.0]}, {"accuracy": 10, "stress": 10, "phones": ["K", "AO0", "L"], "total": 10, "text": "CALL", "phones-accuracy": [2.0, 1.8, 1.8]}, {"accuracy": 10, "stress": 10, "phones": ["IH0", "T"], "total": 10, "text": "IT", "phones-accuracy": [2.0, 2.0]}, {"accuracy": 6, "stress": 10, "phones": ["B", "EH0", "R"], "total": 6, "text": "BEAR", "phones-accuracy": [2.0, 1.0, 1.0]}], "total": 8, "text": "WE CALL IT BEAR"}
创建时间:
2025-01-23



