TCST:A dataset of Tibetan-Chinese speech translation

Name: TCST:A dataset of Tibetan-Chinese speech translation
Creator: Science Data Bank
Published: 2025-04-27 22:11:20
License: 暂无描述

DataCite Commons2025-04-27 更新2025-05-18 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=381faff3e8cf4991a0ab2ca669b2444d

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset is sourced from the WeChat public platform and the publicly available Tibetan speech recognition dataset. Collect data through web crawlers and machine translation assistance, manually segment and annotate it, and finally submit it to experts for review and correction to obtain a high-quality Tibetan Chinese speech translation dataset. This dataset contains 7270 samples, with a size of 965MB. The establishment of this dataset provides a certain data foundation for exploring low resource Tibetan Chinese speech translation technology, helps to promote the progress of related technologies and algorithms, and also provides substantial support for the application of speech translation systems in minority language environments.

该数据集来源于微信公众平台及公开的藏语语音识别数据集。通过网络爬虫与机器翻译辅助采集数据，经人工分割标注后，最终提交专家审核修正，形成高质量的藏汉语音翻译数据集。该数据集包含7270个样本，大小为965MB。此数据集的构建为探索低资源藏汉语音翻译技术奠定了一定的数据基础，有助于推动相关技术与算法的进步，也为语音翻译系统在少数民族语言环境中的应用提供了有力支持。

提供机构：

Science Data Bank

创建时间：

2024-05-15

搜集汇总

数据集介绍