five

LIBE委员会的转录语料库

收藏
arXiv2023-04-17 更新2024-06-21 收录
下载链接:
https://github.com/hdvos/EUParliamentASRDataAndCode
下载链接
链接失效反馈
官方服务:
资源简介:
本研究介绍了由莱顿大学创建的LIBE委员会的转录语料库,总计包含360万运行词。该数据集源自欧盟议会委员会会议的音频记录,通过自动语音识别技术转录而成。数据集内容丰富,涵盖了详细的政治辩论和讨论,为政治科学家提供了宝贵的研究材料。创建过程中,研究团队采用了基于transformer的Wav2vec2.0模型,并进行了领域特定优化,显著提高了转录准确性。该数据集不仅有助于深入理解欧盟内部的政治动态,还为语言学家研究政治话语和口译员的角色提供了丰富的素材。

This study introduces the transcribed corpus of the LIBE Committee, created by Leiden University, which contains a total of 3.6 million running words. This dataset is derived from audio recordings of European Parliament Committee meetings, and was transcribed using automatic speech recognition (ASR) technology. The dataset contains rich content spanning detailed political debates and discussions, serving as valuable research material for political scientists. During its creation, the research team adopted a Transformer-based Wav2vec2.0 model and conducted domain-specific optimization, significantly improving transcription accuracy. This dataset not only facilitates in-depth understanding of political dynamics within the European Union, but also provides abundant materials for linguists to study political discourse and the role of interpreters.
提供机构:
莱顿大学
创建时间:
2023-04-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作