five

LS-100

收藏
魔搭社区2026-04-27 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/pp199124903/LS-100
下载链接
链接失效反馈
官方服务:
资源简介:
```bash git clone https://www.modelscope.cn/datasets/pp199124903/LS-100.git ``` # LS-100 Introduction LibriSpeech is a speech corpus for reading audiobooks based on the LibriVox public domain. The main components of the corpus are as follows: - dev-clean, test-clean: development and test sets containing "clean" speech. - train-clean-100: a training set containing about 100 hours of clean speech. - train-clean-360: training set containing about 360 hours of clean speech. - dev-other, test-other: more challenging dev and test sets - train-other-500: other training set containing about 500 hours of non-clean speech. - mp3: raw audio in mp3 format. - texts: text of the speech transcripts. - raw_metadata: metadata that records various information about the source text/corpus. We found that the subset of Librispeech train-clean-100 is sufficient for our study, so we constructed the LS-100 dataset using the samples of Librispeeh train-clean-100 as the source material. Specifically, we constructed it as follows: 1. join all speaker speech segments into one long speech. 2. count the total duration of each speaker's speech. 3. screen out the 100 speakers with the longest total speaker speech duration and slice their speech into two-second speech. Please cite the following papers when you use the datasets in your work. [1] Y. Li, W. Cao, W. Xie, J. Li and E. Benetos, "Few-Shot Class-Incremental Audio Classification Using Dynamically Expanded Classifier With Self-Attention Modified Prototypes," in IEEE Transactions on Multimedia, vol. 26, pp. 1346-1360, 2024, doi: 10.1109/TMM.2023.3280011. [2] W. Xie, Y. Li, Q. He, W.g Cao, Few-shot class-incremental audio classification via discriminative prototype learning, Expert Systems With Applications, 2023, vol. 225, 120044, pp. 1-13. [3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen, Few-shot class-incremental audio classification using adaptively-refined prototypes, INTERSPEECH, 2023, pp. 301-305. online: https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html  [4] Y. Li, W. Cao, J. Li, W. Xie, Q. He, Few-shot class-incremental audio classification using stochastic classifier, INTERSPEECH, 2023, pp. 4174-4178. online: https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html [5] Y. Li, J. Li, Y. Si, J. Tan and Q. He, "Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2297-2311, 2024, doi: 10.1109/TASLP.2024.3385287. The statistics of the sliced dataset are as follows: | Dataset | LS-100 | | ------------------------------------------------------------ | --------------------------------------------- | | Type of audio | Speech | | Num. of classes | 100 (60 of base classes, 40 of novel classes) | Num. of training / training / novel classes | Num. of training / training / novel classes | Num. of training / validation / testing samples per base class | 500 / 150 / 100 | Num. of training / validation / testing samples per base class | Num. of training / validation / testing samples per novel class | 500 / 150 / 100 | Num. of training / validation / testing samples per base class | 500 / 150 / 100 | Duration of the sample | All in 2 seconds | LS100 directory structure: <pre> . <pre> . ├── 100spks_segments # Segmented audio folder ├── librispeech_fscil_test.csv # Test samples and labels ├── librispeech_fscil_train.csv # Training samples and labels ├── librispeech_fscil_val.csv # Validation samples and labels ├── SPEAKERS.TXT # Mapping of speaker numbers to person names ├── spk_mapping.json # Mapping of speaker numbers to tag values └── spk_total_duration.json # Statistics of total voice duration per speaker </pre> If you want to use this dataset, you need to download all the metadata sections, download LS-100.rar and unzip it. Each of the above csv files is standardized to the following format: `` filename,speaker_id,label 1069_684.wav,1069,0 1069_685.wav,1069,0 1069_686.wav,1069,0 1069_687.wav,1069,0 ``

bash git clone https://www.modelscope.cn/datasets/pp199124903/LS-100.git # LS-100 简介 LibriSpeech是一款基于LibriVox公有领域有声读物构建的朗读语音语料库。 该语料库的主要组成部分如下: - dev-clean、test-clean:包含“干净”语音的开发集与测试集。 - train-clean-100:包含约100小时干净语音的训练集。 - train-clean-360:包含约360小时干净语音的训练集。 - dev-other、test-other:难度更高的开发集与测试集。 - train-other-500:包含约500小时非干净语音的其他训练集。 - mp3:MP3格式的原始音频文件。 - texts:语音转录文本。 - raw_metadata:记录源文本/语料库各类信息的元数据。 我们发现LibriSpeech的train-clean-100子集足以支撑本研究,因此以LibriSpeech train-clean-100的样本为源素材构建了LS-100数据集。具体构建流程如下: 1. 将所有说话人的语音片段拼接为一段长语音。 2. 统计每位说话人的总语音时长。 3. 筛选出总语音时长最长的100位说话人,并将其语音切片为2秒长的语音片段。 若在研究工作中使用本数据集,请引用以下论文: [1] Y. Li, W. Cao, W. Xie, J. Li 与 E. Benetos, 《基于自注意力修正原型与动态扩展分类器的少样本(Few-shot)类别增量音频分类》,载于《IEEE多媒体汇刊》,第26卷,第1346-1360页,2024年,DOI: 10.1109/TMM.2023.3280011。 [2] W. Xie, Y. Li, Q. He, W. Cao, 《基于判别式原型学习的少样本类别增量音频分类》,《专家系统及其应用》,2023年,第225卷,120044,第1-13页。 [3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen, 《基于自适应精调原型的少样本类别增量音频分类》,国际语音通信协会年会(INTERSPEECH),2023年,第301-305页,在线链接:https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html [4] Y. Li, W. Cao, J. Li, W. Xie, Q. He, 《基于随机分类器的少样本类别增量音频分类》,国际语音通信协会年会(INTERSPEECH),2023年,第4174-4178页,在线链接:https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html [5] Y. Li, J. Li, Y. Si, J. Tan 与 Q. He, 《自适应缓解遗忘与过拟合的少样本类别增量音频分类》,载于《IEEE/ACM音频、语音与语言处理汇刊》,第32卷,第2297-2311页,2024年,DOI: 10.1109/TASLP.2024.3385287。 切片后数据集的统计信息如下: | 数据集指标 | LS-100 | | --- | --- | | 音频类型 | 语音 | | 类别数量 | 100(60个基础类别,40个新颖类别) | | 每类基础类别的训练/验证/测试样本数 | 500 / 150 / 100 | | 每类新颖类别的训练/验证/测试样本数 | 500 / 150 / 100 | | 单样本时长 | 均为2秒 | LS-100目录结构如下: . ├── 100spks_segments # 分段音频文件夹 ├── librispeech_fscil_test.csv # 测试样本与标签文件 ├── librispeech_fscil_train.csv # 训练样本与标签文件 ├── librispeech_fscil_val.csv # 验证样本与标签文件 ├── SPEAKERS.TXT # 说话人编号与姓名的映射表 ├── spk_mapping.json # 说话人编号与标签值的映射文件 └── spk_total_duration.json # 每位说话人的总语音时长统计文件 若需使用本数据集,请下载所有元数据文件,同时下载LS-100.rar并解压。 上述各CSV文件均采用如下标准格式: filename,speaker_id,label 1069_684.wav,1069,0 1069_685.wav,1069,0 1069_686.wav,1069,0 1069_687.wav,1069,0
提供机构:
maas
创建时间:
2023-12-30
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
LS-100是一个基于LibriSpeech train-clean-100子集的语音数据集,包含100个说话人的2秒语音片段,适用于少样本增量音频分类任务。数据集结构清晰,包含音频文件、标签文件和元数据文件,支持训练、验证和测试。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作