DISCO

Name: DISCO
Creator: 印度理工学院孟买分校
Published: 2023-10-26 00:32:02
License: 暂无描述

arXiv2023-10-26 更新2024-06-21 收录

下载链接：

https://github.com/vineet2104/DISCO

下载链接

链接失效反馈

官方服务：

资源简介：

DISCO数据集是由印度理工学院孟买分校的研究人员创建的，旨在解决印欧语言中的不流畅问题。该数据集包含超过12000对不流畅和流畅的文本语句，涵盖英语、印地语、德语和法语四种语言。数据集的创建过程涉及语言专家的参与，确保了数据的质量和多样性。DISCO数据集的应用领域主要集中在自动语音识别（ASR）后的文本处理，以及提高机器翻译系统的性能，特别是在处理口语对话时。

The DISCO dataset was created by researchers at the Indian Institute of Technology Bombay to resolve disfluency-related issues in Indo-European languages. It comprises over 12,000 pairs of disfluent and fluent textual utterances spanning four languages: English, Hindi, German, and French. The development process incorporated input from professional linguistic experts to ensure the dataset’s quality and diversity. Primary applications of the DISCO dataset include post-processing text outputs from Automatic Speech Recognition (ASR) systems and enhancing the performance of machine translation systems, particularly in the context of spoken dialogue processing.

提供机构：

印度理工学院孟买分校

创建时间：

2023-10-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集