mesolitica/pseudostreaming-malaysian-youtube-whisper-large-v3
收藏Pseudostreaming Malaysian Youtube videos using Whisper Large V3
数据集概述
- 许可证: MIT
- 任务类别: 自动语音识别
- 语言: 马来语
数据集详情
- 总时长: 40486.589364839296小时
- 数据格式: 从
processed.jsonl文件中提取
数据示例
json [ { "text": "dalam sukan olimpik dan paralimpik tokyo dua ribu dua puluh", "start": 3.52, "end": 6.46, "audio_filename": "processed-audio/1-225586-0.mp3", "original_audio_filename": "output-audio/3-1084-10.mp3" }, { "text": "to azizul has", "start": 7.12, "end": 8.179999999999998, "audio_filename": "processed-audio/1-225586-1.mp3", "original_audio_filename": "output-audio/3-1084-10.mp3" }, { "text": "awang meraih kilauan perak untuk malaysia dalam sukan olimpik tokyo dua ribu dua puluh tampil sebagai satu satunya wakil asia bagaimanapun beliau terpaksa akur di tangan pelumba great britain jason", "start": 8.4, "end": 22.98, "audio_filename": "processed-audio/1-225586-2.mp3", "original_audio_filename": "output-audio/3-1084-10.mp3" }, { "text": "y yang meraih pingat emas", "start": 23.28, "end": 25.060000000000002, "audio_filename": "processed-audio/1-225586-3.mp3", "original_audio_filename": "output-audio/3-1084-10.mp3" } ]



