five

Mbochi speech corpus

收藏
DataCite Commons2022-06-01 更新2024-07-13 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/1419
下载链接
链接失效反馈
官方服务:
资源简介:
The Mbochi speech corpus was developed in the framework of ANR-DFG BULB project. This project aims to provide field linguists (eg working on morphology) with tools for less or not written languages. The provided corpus is a subset from the corpus developed in this framework.<p><p>The provided corpus consists of 5131 sentences recorded in mbochi, together with their transcription and French translation, as well as the results from the work made during JSALT workshop (within one of the topics which was "the speaking Rosetta stone - Discovering Grounded Linguistic Units for Languages without Orthography"): alignments at the phonetic level and various results of unsupervised word segmentation from audio. The audio corpus is made up of 4,5 hours, downsampled at 16kHz, 16bits, with Linear PCM encoding. Data is distributed into 2 parts, one for training consisting of 4617 sentences, and one for development consisting of 514 sentences.
提供机构:
ELG
创建时间:
2022-06-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作