five

wenet-e2e/Speech-Dataset-Analyze

收藏
Hugging Face2022-12-12 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/wenet-e2e/Speech-Dataset-Analyze
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # Dataset Analyze Brief results (example): ```sh ================== max dur: 14.531 s (wav_id: BAC009S0658W0472) P99 dur: 8.510 s (wav_id: BAC009S0156W0340) P75 dur: 5.326 s (wav_id: BAC009S0128W0161) P50 dur: 4.262 s (wav_id: BAC009S0134W0187) P25 dur: 3.494 s (wav_id: BAC009S0710W0419) min dur: 1.230 s (wav_id: BAC009S0141W0423) avg dur: 4.522 s ================== max txt_length: 44.000 (wav_id: BAC009S0107W0142) P99 txt_length: 24.000 (wav_id: BAC009S0234W0207) P75 txt_length: 18.000 (wav_id: BAC009S0091W0334) P50 txt_length: 14.000 (wav_id: BAC009S0125W0419) P25 txt_length: 11.000 (wav_id: BAC009S0167W0302) min txt_length: 1.000 (wav_id: BAC009S0094W0358) avg txt_length: 14.406 ================== max speed: 5.496 char/s (wav_id: BAC009S0135W0430) P99 speed: 4.360 char/s (wav_id: BAC009S0708W0486) P75 speed: 3.520 char/s (wav_id: BAC009S0088W0358) P50 speed: 3.204 char/s (wav_id: BAC009S0422W0164) P25 speed: 2.894 char/s (wav_id: BAC009S0340W0436) min speed: 0.606 char/s (wav_id: BAC009S0094W0364) avg speed: 3.186 char/s ================== max leading_sil: 5120.000 ms (wav_id: BAC009S0118W0473) P99 leading_sil: 576.000 ms (wav_id: BAC009S0070W0283) P75 leading_sil: 416.000 ms (wav_id: BAC009S0421W0428) P50 leading_sil: 32.000 ms (wav_id: BAC009S0244W0443) P25 leading_sil: 0.000 ms (wav_id: BAC009S0209W0423) min leading_sil: 0.000 ms (wav_id: BAC009S0168W0257) avg leading_sil: 166.765 ms ================== max trailing_sil: 1486.000 ms (wav_id: BAC009S0007W0174) P99 trailing_sil: 567.125 ms (wav_id: BAC009S0122W0365) P75 trailing_sil: 270.062 ms (wav_id: BAC009S0363W0374) P50 trailing_sil: 0.000 ms (wav_id: BAC009S0196W0457) P25 trailing_sil: 0.000 ms (wav_id: BAC009S0038W0148) min trailing_sil: 0.000 ms (wav_id: BAC009S0168W0257) avg trailing_sil: 128.904 ms ``` Detailed results (example): ```sh {"txt": "娱乐频道", "wav": "None", "sample_rate": 16000, "key": "BAC009S0141W0423", "dur": 1.23, "txt_length": 4, "speed": 3.252032520325203, "leading_sil": 128.0, "trailing_sil": 174.0} {"txt": "适用税率", "wav": "None", "sample_rate": 16000, "key": "BAC009S0124W0224", "dur": 1.3050625, "txt_length": 4, "speed": 3.0649873090369235, "leading_sil": 0, "trailing_sil": 0} {"txt": "一", "wav": "None", "sample_rate": 16000, "key": "BAC009S0094W0358", "dur": 1.3420625, "txt_length": 1, "speed": 0.7451217808410563, "leading_sil": 416.0, "trailing_sil": 318.0625} {"txt": "周群", "wav": "None", "sample_rate": 16000, "key": "BAC009S0002W0272", "dur": 1.344, "txt_length": 2, "speed": 1.488095238095238, "leading_sil": 0, "trailing_sil": 0} {"txt": "你有苹果吗", "wav": "None", "sample_rate": 16000, "key": "BAC009S0144W0217", "dur": 1.3470625, "txt_length": 5, "speed": 3.7117802626084533, "leading_sil": 64.0, "trailing_sil": 0} {"txt": "虽然只是背影", "wav": "None", "sample_rate": 16000, "key": "BAC009S0128W0418", "dur": 1.357875, "txt_length": 6, "speed": 4.418668876001105, "leading_sil": 32.0, "trailing_sil": 0} {"txt": "六", "wav": "None", "sample_rate": 16000, "key": "BAC009S0094W0363", "dur": 1.3610625, "txt_length": 1, "speed": 0.7347201175552188, "leading_sil": 416.0, "trailing_sil": 465.0625} {"txt": "八", "wav": "None", "sample_rate": 16000, "key": "BAC009S0094W0365", "dur": 1.396, "txt_length": 1, "speed": 0.7163323782234957, "leading_sil": 480.0, "trailing_sil": 436.0} {"txt": "六十万人", "wav": "None", "sample_rate": 16000, "key": "BAC009S0004W0433", "dur": 1.4029375, "txt_length": 4, "speed": 2.851160511426917, "leading_sil": 160.0, "trailing_sil": 154.9375} {"txt": "博士", "wav": "None", "sample_rate": 16000, "key": "BAC009S0002W0302", "dur": 1.4749375, "txt_length": 2, "speed": 1.3559896605788382, "leading_sil": 0, "trailing_sil": 0} ```

This dataset contains detailed statistical information of a series of audio files, including audio duration, text length, speaking speed, leading and trailing silence, etc. Each audio file is associated with attributes such as text content, audio identifier, sample rate, and provides the maximum, minimum, average values, and percentiles (P99, P75, P50, P25) of these attributes.
提供机构:
wenet-e2e
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作