libritts-all-kaldi-data
收藏魔搭社区2025-12-04 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/CantabileKwok/libritts-all-kaldi-data
下载链接
链接失效反馈官方服务:
资源简介:
# Data Description
这个数据是LibriTTS全部数据的Kaldi目录,以及其提取的PPE特征和VQ-wav2vec特征。
由于这些数据过大,且难以本地提取,故以数据集的形式发布。具体使用方式请参考https://github.com/cantabile-kwok/CTX-vec2wav。
### `data/` directory: data manifests
We have organized the `data` directory containing all the LibriTTS data. Here are the steps to establish the `data` dir.
1. Please download from [here](https://huggingface.co/datasets/cantabile-kwok/libritts-all-kaldi-data/resolve/main/data.zip) (about 5MB; or [here](https://www.modelscope.cn/api/v1/datasets/CantabileKwok/libritts-all-kaldi-data/repo?Revision=master&FilePath=data.zip) for Mainland Chinese users), and unzip it to `data` in the project root. Every sub-directory contains `utt2spk, spk2utt` and `wav.scp` files. They are all plain texts, with `<key> <value>` in each line.
2. As we are using the 16kHz version of LibriTTS, please down-sample the speech data if you don't have them.
3. Then, change the paths in `wav.scp` to the correct ones in your machine.
### `feats/` directory: speech features
We include three types of speech features in CTX-vec2wav. They should all be stored in `feats/` directory in project root.
* **VQ index (together with codebook) from vq-wav2vec**. We extracted it by [fairseq](https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec#vq-wav2vec),
and we provide the extracted VQ index sequences with codebook online.
1. Please download from [here](https://huggingface.co/datasets/cantabile-kwok/libritts-all-kaldi-data/resolve/main/vqidx.zip) (460MB; [here](https://www.modelscope.cn/api/v1/datasets/CantabileKwok/libritts-all-kaldi-data/repo?Revision=master&FilePath=vqidx.zip) for Chinese users).
2. Unzip it to `feats/vqidx`, and change the corresponding paths in the `feats.scp`.
3. You can check out the feature shape by `feat-to-shape.py scp:feats/vqidx/eval_all/feats.scp | head`. The shapes should be `(frames, 2)`.
* **PPE auxiliary features**. PPE stands for probability of voice, pitch and energy (all in log scale). We extracted them using Kaldi and, to avoid you from installing Kaldi, we provide the extracted and normalized features online.
1. Please download from [here](https://huggingface.co/datasets/cantabile-kwok/libritts-all-kaldi-data/resolve/main/normed_ppe.zip) (570MB; [here](https://www.modelscope.cn/api/v1/datasets/CantabileKwok/libritts-all-kaldi-data/repo?Revision=master&FilePath=normed_ppe.zip) for Chinese users).
2. Similarly, please unzip it to `feats/normed_ppe`, and change the corresponding paths in `feats.scp`.
3. Check: the shapes of these features should be `(frames, 3)`.
### 引用
请使用如下bibtex引用这个工作:
```
@article{du2023unicats,
title={UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding},
author={Du, Chenpeng and Guo, Yiwei and Shen, Feiyu and Liu, Zhijun and Liang, Zheng and Chen, Xie and Wang, Shuai and Zhang, Hui and Yu, Kai},
journal={arXiv preprint arXiv:2306.07547},
year={2023}
}
```
# 数据集说明
本数据集包含完整LibriTTS语料的Kaldi目录,以及从中提取的PPE特征与VQ-wav2vec特征。鉴于该数据集体量庞大且本地提取难度较高,故以公开数据集形式发布。具体使用方法请参考:https://github.com/cantabile-kwok/CTX-vec2wav。
### `data/` 目录:数据清单文件
我们已整理完成包含全部LibriTTS语料的`data`目录,以下为配置该目录的操作步骤:
1. 请从[此处](https://huggingface.co/datasets/cantabile-kwok/libritts-all-kaldi-data/resolve/main/data.zip)下载(约5MB;中国大陆用户可从[此处](https://www.modelscope.cn/api/v1/datasets/CantabileKwok/libritts-all-kaldi-data/repo?Revision=master&FilePath=data.zip)下载),并解压至项目根目录下的`data`文件夹。每个子目录均包含`utt2spk`、`spk2utt`与`wav.scp`文件,均为纯文本格式,每行格式为`<key> <value>`。
2. 由于本数据集采用LibriTTS的16kHz版本,若您尚未拥有该采样率的语音数据,请先进行下采样处理。
3. 随后,请将`wav.scp`中的文件路径修改为您本地设备中的正确路径。
### `feats/` 目录:语音特征
本项目CTX-vec2wav包含三类语音特征,均需存放于项目根目录下的`feats/`目录中。
* **来自VQ-wav2vec的量化索引(含码本)**:我们通过[fairseq](https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec#vq-wav2vec)工具包提取该特征,并将已提取的VQ索引序列与码本公开提供。
1. 请从[此处](https://huggingface.co/datasets/cantabile-kwok/libritts-all-kaldi-data/resolve/main/vqidx.zip)下载(460MB;中国大陆用户可从[此处](https://www.modelscope.cn/api/v1/datasets/CantabileKwok/libritts-all-kaldi-data/repo?Revision=master&FilePath=vqidx.zip)下载)。
2. 将压缩包解压至`feats/vqidx`目录,并修改`feats.scp`中对应的文件路径。
3. 可通过命令`feat-to-shape.py scp:feats/vqidx/eval_all/feats.scp | head`验证特征维度,正确的维度格式应为`(帧数, 2)`。
* **PPE辅助特征**:PPE(Probability of Voice, Pitch and Energy,即语音、基频与能量概率,均采用对数刻度)。我们通过Kaldi工具包提取该特征,为避免用户额外安装Kaldi,我们将已提取并归一化的特征公开提供。
1. 请从[此处](https://huggingface.co/datasets/cantabile-kwok/libritts-all-kaldi-data/resolve/main/normed_ppe.zip)下载(570MB;中国大陆用户可从[此处](https://www.modelscope.cn/api/v1/datasets/CantabileKwok/libritts-all-kaldi-data/repo?Revision=master&FilePath=normed_ppe.zip)下载)。
2. 同理,将压缩包解压至`feats/normed_ppe`目录,并修改`feats.scp`中对应的文件路径。
3. 验证说明:该类特征的维度格式应为`(帧数, 3)`。
### 引用
请使用以下BibTeX格式引用本研究:
@article{du2023unicats,
title={UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding},
author={Du, Chenpeng and Guo, Yiwei and Shen, Feiyu and Liu, Zhijun and Liang, Zheng and Chen, Xie and Wang, Shuai and Zhang, Hui and Yu, Kai},
journal={arXiv preprint arXiv:2306.07547},
year={2023}
}
提供机构:
maas
创建时间:
2023-09-27



