asr-alignment

Name: asr-alignment
Creator: maas
Published: 2025-05-05 16:52:13
License: 暂无描述

魔搭社区2025-05-05 更新2025-03-15 收录

下载链接：

https://modelscope.cn/datasets/pengzhendong/asr-alignment

下载链接

链接失效反馈

官方服务：

资源简介：

# Speech Recognition Alignment Dataset This dataset is a variation of several widely-used ASR datasets, encompassing Librispeech, MuST-C, TED-LIUM, VoxPopuli, Common Voice, and GigaSpeech. The difference is this dataset includes: - Precise alignment between audio and text. - Text that has been punctuated and made case-sensitive. - Identification of named entities in the text. # Usage First, install the latest version of the 🤗 Datasets package: ```bash pip install --upgrade pip pip install --upgrade datasets[audio] ``` The dataset can be downloaded and pre-processed on disk using the [`load_dataset`](https://huggingface.co/docs/datasets/v2.14.5/en/package_reference/loading_methods#datasets.load_dataset) function: ```python from datasets import load_dataset # Available dataset: 'libris','mustc','tedlium','voxpopuli','commonvoice','gigaspeech' dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris") # take the first sample of the validation set sample = dataset["train"][0] ``` It can also be streamed directly from the Hub using Datasets' [streaming mode](https://huggingface.co/blog/audio-datasets#streaming-mode-the-silver-bullet). Loading a dataset in streaming mode loads individual samples of the dataset at a time, rather than downloading the entire dataset to disk: ```python from datasets import load_dataset dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris", streaming=True) # take the first sample of the validation set sample = next(iter(dataset["train"])) ``` ## Citation If you use this data, please consider citing the [ICASSP 2024 Paper: SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR](): ``` @INPROCEEDINGS{synthetic-multi-asr-nguyen, author={Nguyen, Thai-Binh and Waibel, Alexander}, booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR}, year={2024}, volume={}, number={}, } ``` ## License This dataset is licensed in accordance with the terms of the original dataset.

# 语音识别对齐数据集（Speech Recognition Alignment Dataset）本数据集为多款主流自动语音识别（Automatic Speech Recognition，ASR）数据集的衍生版本，涵盖Librispeech、MuST-C、TED-LIUM、VoxPopuli、Common Voice及GigaSpeech。其核心差异在于本数据集包含： - 音频与文本间的精确对齐标注 - 经过标点规范化且区分大小写的文本标注 - 文本中的命名实体识别标注 ## 使用方法首先，安装最新版本的🤗 数据集（Datasets）库： bash pip install --upgrade pip pip install --upgrade datasets[audio] 可通过[`load_dataset`](https://huggingface.co/docs/datasets/v2.14.5/en/package_reference/loading_methods#datasets.load_dataset)函数实现数据集的下载与本地预处理： python from datasets import load_dataset # 可选数据集名称：'libris','mustc','tedlium','voxpopuli','commonvoice','gigaspeech' dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris") # 获取验证集的第一条样本 sample = dataset["train"][0] 也可通过数据集库的[流式加载模式](https://huggingface.co/blog/audio-datasets#streaming-mode-the-silver-bullet)直接从Hugging Face Hub流式读取数据。流式加载模式会单次加载单条数据样本，而非将完整数据集下载至本地磁盘： python from datasets import load_dataset dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris", streaming=True) # 获取验证集的第一条样本 sample = next(iter(dataset["train"])) ## 引用方式若您使用本数据集，请引用以下[ICASSP 2024论文：SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR]()： @INPROCEEDINGS{synthetic-multi-asr-nguyen, author={Nguyen, Thai-Binh and Waibel, Alexander}, booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR}, year={2024}, volume={}, number={}, } ## 授权协议本数据集的授权协议遵循其原始数据集的相关条款。

提供机构：

maas

创建时间：

2025-03-12

搜集汇总

数据集介绍