openpecha/garchen_rinpoche_benchmark

Name: openpecha/garchen_rinpoche_benchmark
Creator: openpecha
Published: 2025-06-30 06:46:29
License: 暂无描述

Hugging Face2025-06-30 更新2025-08-09 收录

下载链接：

https://hf-mirror.com/datasets/openpecha/garchen_rinpoche_benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

Garchen Rinpoche STT Benchmark数据集是一个语音转文本的基准数据集，包含38个原始音频文件，共13786个音频片段。数据集通过分层抽样策略确保在年龄组、时长分类和内容类型上的代表性。数据集分为有效和基准两个部分，有效部分仍为13786个音频片段，而基准部分包含978个音频片段。数据集涵盖了70-80岁和80-90岁两个年龄组，时长分类包括长（20-30秒）、中（10-20秒）和短（0.5-10秒），内容类型有教学、实践、问答和祈祷。在有效片段中，80-90岁年龄组的长时间实践类占比最高，达到63.5%。

The Garchen Rinpoche STT Benchmark Dataset is a speech-to-text benchmark dataset consisting of 38 original audio files and a total of 13,786 audio segments. The dataset uses a stratified sampling strategy to ensure representation across age groups, duration categories, and content types. The dataset is divided into valid and benchmark sections, with the valid section containing 13,786 audio segments and the benchmark section containing 978 audio segments. The dataset covers two age groups, 70-80 and 80-90, duration categories including long (20-30 seconds), medium (10-20 seconds), and short (0.5-10 seconds), and content types such as teaching, practice, Q&A, and prayer. Among the valid segments, the 80-90 age group with long duration practice type accounts for the highest percentage at 63.5%.

提供机构：

openpecha

5,000+

优质数据集

54 个

任务类型

进入经典数据集