five

Althingi Parliamentary Speech

收藏
DataCite Commons2021-02-17 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2021S01
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction<br><br> Althingi Parliamentary Speech consists of approximately 542 hours of recorded speech from Althingi, the Icelandic Parliament, along with corresponding transcripts, a pronunciation dictionary and two language models. Speeches date from 2005-2016.<br><br> This dataset was collected in 2016 by the ASR for Althingi project at Reykjavik University in collaboration with the Althingi speech department. The purpose of that project was to develop an ASR (automatic speech recognition) system for parliamentary speech to replace the procedure of manually transcribing performed speeches. Data<br><br> The mean speech length is six minutes, with speeches ranging from under one minute to around thirty minutes. The corpus features 197 speakers (105 male, 92 female) and is split into training, development and evaluation sets. The language models are of two types: a pruned trigram model, used in decoding, and an unpruned constant ARPA 5-gram model, used for re-scoring decoding results.<br><br> Audio data is presented as single channel 16-bit mp3 files; the majority of these files have a sample rate of 44.1 kHz. Transcripts and other text data are plain text encoded in UTF-8. Samples<br><br> Please view this audio sample and transcript sample. Updates<br><br> None at this time. Copyright Portions © 2021 Reykjavik University, © 2021 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2021-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作