typhoon-ai/gigaspeech2-typhoon

Name: typhoon-ai/gigaspeech2-typhoon
Creator: typhoon-ai
Published: 2026-01-23 02:22:45
License: 暂无描述

Hugging Face2026-01-23 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/typhoon-ai/gigaspeech2-typhoon

下载链接

链接失效反馈

官方服务：

资源简介：

Gigaspeech2 Typhoon是一个专门用于泰语语音识别基准测试的元数据参考数据集，作为评估ASR模型的准确性追踪。该数据集包含1000个测试样本，每个样本包含音频ID和人工转录的泰语文本。音频文件需要从原始Gigaspeech2数据集中下载，通过音频ID进行匹配。数据集来源于Gigaspeech2泰国语料库，遵循CC-BY 4.0许可。数据集结构包含一个测试分割，数据字段包括audio_id（唯一标识符）和sentence（人工转录文本）。数据集主要用于ASR模型的准确性评估，不适用于训练。

Gigaspeech2 Typhoon is a metadata-only reference dataset for Thai speech recognition benchmarking, specifically designed as an Accuracy Track for evaluating ASR models. The dataset contains 1,000 test samples with audio IDs and human transcriptions derived from the Gigaspeech2 corpus. Audio files are not included and must be downloaded from the original Gigaspeech2 dataset using the audio_id. The dataset is sourced from the Gigaspeech2 Thai corpus and follows the CC-BY 4.0 license. It contains a single test split with data fields including audio_id (unique identifier) and sentence (human transcription). The dataset is intended for ASR accuracy evaluation only, not for training.

提供机构：

typhoon-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集