five

asr-alignment

收藏
魔搭社区2025-05-05 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/pengzhendong/asr-alignment
下载链接
链接失效反馈
官方服务:
资源简介:
# Speech Recognition Alignment Dataset This dataset is a variation of several widely-used ASR datasets, encompassing Librispeech, MuST-C, TED-LIUM, VoxPopuli, Common Voice, and GigaSpeech. The difference is this dataset includes: - Precise alignment between audio and text. - Text that has been punctuated and made case-sensitive. - Identification of named entities in the text. # Usage First, install the latest version of the 🤗 Datasets package: ```bash pip install --upgrade pip pip install --upgrade datasets[audio] ``` The dataset can be downloaded and pre-processed on disk using the [`load_dataset`](https://huggingface.co/docs/datasets/v2.14.5/en/package_reference/loading_methods#datasets.load_dataset) function: ```python from datasets import load_dataset # Available dataset: 'libris','mustc','tedlium','voxpopuli','commonvoice','gigaspeech' dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris") # take the first sample of the validation set sample = dataset["train"][0] ``` It can also be streamed directly from the Hub using Datasets' [streaming mode](https://huggingface.co/blog/audio-datasets#streaming-mode-the-silver-bullet). Loading a dataset in streaming mode loads individual samples of the dataset at a time, rather than downloading the entire dataset to disk: ```python from datasets import load_dataset dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris", streaming=True) # take the first sample of the validation set sample = next(iter(dataset["train"])) ``` ## Citation If you use this data, please consider citing the [ICASSP 2024 Paper: SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR](): ``` @INPROCEEDINGS{synthetic-multi-asr-nguyen, author={Nguyen, Thai-Binh and Waibel, Alexander}, booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR}, year={2024}, volume={}, number={}, } ``` ## License This dataset is licensed in accordance with the terms of the original dataset.

# 语音识别对齐数据集(Speech Recognition Alignment Dataset) 本数据集为多款主流自动语音识别(Automatic Speech Recognition,ASR)数据集的衍生版本,涵盖Librispeech、MuST-C、TED-LIUM、VoxPopuli、Common Voice及GigaSpeech。其核心差异在于本数据集包含: - 音频与文本间的精确对齐标注 - 经过标点规范化且区分大小写的文本标注 - 文本中的命名实体识别标注 ## 使用方法 首先,安装最新版本的🤗 数据集(Datasets)库: bash pip install --upgrade pip pip install --upgrade datasets[audio] 可通过[`load_dataset`](https://huggingface.co/docs/datasets/v2.14.5/en/package_reference/loading_methods#datasets.load_dataset)函数实现数据集的下载与本地预处理: python from datasets import load_dataset # 可选数据集名称:'libris','mustc','tedlium','voxpopuli','commonvoice','gigaspeech' dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris") # 获取验证集的第一条样本 sample = dataset["train"][0] 也可通过数据集库的[流式加载模式](https://huggingface.co/blog/audio-datasets#streaming-mode-the-silver-bullet)直接从Hugging Face Hub流式读取数据。流式加载模式会单次加载单条数据样本,而非将完整数据集下载至本地磁盘: python from datasets import load_dataset dataset = load_dataset("nguyenvulebinh/asr-alignment", "libris", streaming=True) # 获取验证集的第一条样本 sample = next(iter(dataset["train"])) ## 引用方式 若您使用本数据集,请引用以下[ICASSP 2024论文:SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR](): @INPROCEEDINGS{synthetic-multi-asr-nguyen, author={Nguyen, Thai-Binh and Waibel, Alexander}, booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={SYNTHETIC CONVERSATIONS IMPROVE MULTI-TALKER ASR}, year={2024}, volume={}, number={}, } ## 授权协议 本数据集的授权协议遵循其原始数据集的相关条款。
提供机构:
maas
创建时间:
2025-03-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作