kontextox/uk_UA-ASMR
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/kontextox/uk_UA-ASMR
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- uk
license: cc0-1.0
datasets:
- kontextox/uk_UA-ASMR
tags:
- tts
- text-to-speech
- ukrainian
- asmr
- piper
configs:
- config_name: default
data_files: "metadata.csv"
sep: "|"
column_names: ["file_name", "text"]
---
# Ukrainian ASMR TTS Dataset
A Ukrainian text-to-speech dataset for training single-speaker ASMR-style voice models using [Piper](https://github.com/OHF-voice/piper1-gpl).
## Dataset Details
| Property | Value |
| ------------ | -------------------------- |
| Language | Ukrainian (uk_UA) |
| Speakers | 1 |
| Segments | 7,318 |
| Audio Format | 16-bit WAV, 22050 Hz, Mono |
| License | CC0 |
## Dataset Structure
### Prerequisites
```bash
# Install Piper training dependencies
git clone https://github.com/kontextox/piper1-gpl.git
cd piper1-gpl
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e '.[train]'
./build_monotonic_align.sh
# If use `OHF-voice/piper1-gpl` (fixed in `kontextox/piper1-gpl`):
# pip install scikit-build
python3 setup.py build_ext --inplace
# CRITICAL FIX for custom text phonemes in Piper `OHF-voice/piper1-gpl` (fixed in `kontextox/piper1-gpl`):
# This patches dataset.py to properly use the custom phoneme map loaded via --data.phonemes_path
# sed -i 's/phonemes_to_ids(sentence_phonemes)/phonemes_to_ids(sentence_phonemes, id_map=self.piper_config.phoneme_id_map)/g' src/piper/train/vits/dataset.py
```
```bash
# 1. Download the dataset
hf download kontextox/uk_UA-ASMR \
--repo-type dataset --local-dir uk_UA-ASMR
tar -xzf uk_UA-ASMR/clear_audio.tar.gz -C uk_UA-ASMR/audio
# 2. Download the base checkpoint AND its configuration
hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt \
--repo-type dataset --local-dir uk_UA-ASMR/checkpoints
hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/config.json \
--repo-type dataset --local-dir uk_UA-ASMR/checkpoints
# 3. Extract the exact phoneme map from the base config to use for training
python3 -c "import json; d=json.load(open('uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/config.json')); json.dump(d['phoneme_id_map'], open('uk_UA-ASMR/phonemes.json','w'), ensure_ascii=False, indent=2)"
```
```text
uk_UA-ASMR/
├── README.md
├── metadata.csv # Metadata
├── phonemes.json # Automatically extracted Ukrainian phoneme map
├── audio/ # Audio files (22050 Hz, mono, 16-bit)
│ ├── utt_0001.wav
│ ├── utt_0002.wav
│ └── ...
└── checkpoints/uk/uk_UA/ukrainian_tts/medium/
├── config.json
└── epoch=2090-step=1166778.ckpt
```
## Audio Specifications
- **Sample Rate**: 22050 Hz
- **Channels**: Mono
- **Bit Depth**: 16-bit
- **Format**: WAV
## Training
### Training Command
```bash
python3 -m piper.train fit \
--data.voice_name "uk_asmr" \
--data.csv_path uk_UA-ASMR/metadata.csv \
--data.audio_dir uk_UA-ASMR/audio \
--data.espeak_voice "uk" \
--model.sample_rate 22050 \
--data.phoneme_type "text" \
--data.dataset_type "text" \
--data.phonemes_path uk_UA-ASMR/phonemes.json \
--data.cache_dir uk_UA-ASMR/cache \
--data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json \
--data.batch_size 32 \
--data.num_workers 8 \
--model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt \
--trainer.max_epochs 500 \
--trainer.check_val_every_n_epoch 1 \
--trainer.default_root_dir uk_UA-ASMR/output
```
_**Note**: `--trainer.default_root_dir` ensures PyTorch Lightning saves logs and checkpoints cleanly to `uk_UA-ASMR/output/lightning_logs/`_
_**Note**: The NVIDIA driver on your system is too old (found version 12080) or NVIDIA GeForce RTX 5090 with CUDA capability `sm_120` is not compatible with the current PyTorch installation:_
- Check: `python -c "import torch; print(torch.__version__); print(torch.cuda.get_arch_list()); print(torch.randn(1).cuda())"`
- Run: `pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128`
_**Note**: Check CPU process `find uk_UA-ASMR/cache -name "*.pt" | wc -l`_
### Hardware Configuration
| GPU | VRAM | Batch Size | Num Workers | Speed | Epoch Time |
| ------------ | ----- | ---------- | ----------- | --------- | ---------- |
| **H100 NVL** | 93 GB | 128 | 16 | ~1.0 it/s | ~60s |
| **L40S** | 46 GB | 32 | 8 | ~2.6 it/s | ~90s |
| **RTX 3090** | 24 GB | 24 | 8 | ~2.0 it/s | ~120s |
| **RTX 3060** | 12 GB | 16 | 4 | ~1.5 it/s | ~180s |
| **A100** | 80 GB | 96 | 16 | ~0.9 it/s | ~70s |
#### Continue from latest checkpoint
```bash
python3 -m piper.train fit \
--data.voice_name "uk_asmr" \
--data.csv_path uk_UA-ASMR/metadata.csv \
--data.audio_dir uk_UA-ASMR/audio \
--data.espeak_voice "uk" \
--model.sample_rate 22050 \
--data.phoneme_type "text" \
--data.dataset_type "text" \
--data.phonemes_path uk_UA-ASMR/phonemes.json \
--data.cache_dir uk_UA-ASMR/cache \
--data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json \
--data.batch_size 32 \
--model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt \
--trainer.max_epochs 500 \
--trainer.check_val_every_n_epoch 1 \
--trainer.default_root_dir uk_UA-ASMR/output \
--ckpt_path uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=35-step=14832.ckpt
```
_(Check your `uk_UA-ASMR/output/lightning_logs/` folder for the exact `.ckpt` filename)_
_**Note**: Find checkpoints `find /workspace -name "*.ckpt" 2>/dev/null | head -5`_
### Exporting
```bash
# 1. Export the ONNX model from your best/latest checkpoint
python3 -m piper.train.export_onnx \
--checkpoint uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=14-step=6180.ckpt \
--output-file uk_UA-ASMR/output/uk_UA-asmr-medium.onnx
```
## Model Output
After training and export, you will have:
| File | Description |
| ----------------------------- | ------------------------ |
| `uk_UA-asmr-medium.onnx` | ONNX model for inference |
| `uk_UA-asmr-medium.onnx.json` | Model configuration |
## Usage with Piper
```bash
# Install piper
pip install piper-tts
# Generate speech
# (Pipe the text using 'echo' to avoid CLI parsing errors with raw text modes)
echo "привіт, як справи?" | python3 -m piper \
--model uk_UA-ASMR/output/uk_UA-asmr-medium.onnx \
--output_file audio.wav
```
## Phoneme Type
This dataset uses `phoneme_type: "text"`, meaning raw Ukrainian characters are used directly without espeak-ng phonemization. The model uses a character-based phoneme map with Ukrainian Cyrillic characters.
Valid characters:
```
а б в г ґ д е є ж з и і ї й к л м н о п р с т у ф х ц ч ш щ ь ю я
```
Plus punctuation: `space ! ' , - . : ; ? _ ^ $ — ` + diacritics
Metadata:
```
utt_4197.wav|про що ти хочеш мене попросити?
utt_4198.wav|запитала вона підозріло.
```
## Citation
If you use this dataset, please cite:
```bibtex
@misc{uk_ua_asmr,
title={Ukrainian ASMR TTS Dataset},
author={Kontextox},
year={2026},
url={https://huggingface.co/datasets/kontextox/uk_UA-ASMR}
}
```
## License
CC0 - Public Domain
## Acknowledgments
- Base Ukrainian model: [OHF-Voice/voice-datasets](https://github.com/OHF-Voice/voice-datasets)
- Training framework: [Piper](https://github.com/OHF-voice/piper1-gpl)
语言:乌克兰语(uk)
许可证:CC0-1.0
关联数据集:kontextox/uk_UA-ASMR
标签:文本转语音(text-to-speech, TTS)、乌克兰语、自发性知觉经络反应(Autonomous Sensory Meridian Response, ASMR)、Piper
配置项:
- 配置名称:default
数据文件:metadata.csv
分隔符:|
列名:["file_name", "text"]
# 乌克兰语ASMR文本转语音数据集
一款用于训练单发言人自发性知觉经络反应(Autonomous Sensory Meridian Response, ASMR)风格语音模型的乌克兰语文本转语音数据集,采用[Piper](https://github.com/OHF-voice/piper1-gpl)框架实现。
## 数据集详情
| 属性 | 数值说明 |
| ------------ | -------------------------- |
| 语言 | 乌克兰语(uk_UA) |
| 发言人数量 | 1 |
| 语音片段数 | 7318段 |
| 音频格式 | 16位WAV、22050Hz、单声道 |
| 许可证 | CC0 |
## 数据集结构
### 前置依赖
bash
# 安装Piper训练依赖项
git clone https://github.com/kontextox/piper1-gpl.git
cd piper1-gpl
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -e '.[train]'
./build_monotonic_align.sh
# 若使用`OHF-voice/piper1-gpl`(`kontextox/piper1-gpl`中已修复该问题):
# pip install scikit-build
python3 setup.py build_ext --inplace
# Piper `OHF-voice/piper1-gpl`自定义文本音素的关键修复(`kontextox/piper1-gpl`中已修复):
# 该补丁用于修改dataset.py,使其可正确使用通过--data.phonemes_path加载的自定义音素映射表
# sed -i 's/phonemes_to_ids(sentence_phonemes)/phonemes_to_ids(sentence_phonemes, id_map=self.piper_config.phoneme_id_map)/g' src/piper/train/vits/dataset.py
bash
# 1. 下载数据集
hf download kontextox/uk_UA-ASMR
--repo-type dataset --local-dir uk_UA-ASMR
tar -xzf uk_UA-ASMR/clear_audio.tar.gz -C uk_UA-ASMR/audio
# 2. 下载基础检查点及其配置文件
hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt
--repo-type dataset --local-dir uk_UA-ASMR/checkpoints
hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/config.json
--repo-type dataset --local-dir uk_UA-ASMR/checkpoints
# 3. 从基础配置文件中提取精确音素映射表,用于训练
python3 -c "import json; d=json.load(open('uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/config.json')); json.dump(d['phoneme_id_map'], open('uk_UA-ASMR/phonemes.json','w'), ensure_ascii=False, indent=2)"
text
uk_UA-ASMR/
├── README.md
├── metadata.csv # 元数据文件
├── phonemes.json # 自动提取的乌克兰语音素映射表
├── audio/ # 音频文件目录(22050Hz、单声道、16位)
│ ├── utt_0001.wav
│ ├── utt_0002.wav
│ └── ...
└── checkpoints/uk/uk_UA/ukrainian_tts/medium/
├── config.json
└── epoch=2090-step=1166778.ckpt
## 音频规格
- **采样率**:22050Hz
- **声道数**:单声道
- **位深度**:16位
- **格式**:WAV
## 训练流程
### 训练命令
bash
python3 -m piper.train fit
--data.voice_name "uk_asmr"
--data.csv_path uk_UA-ASMR/metadata.csv
--data.audio_dir uk_UA-ASMR/audio
--data.espeak_voice "uk"
--model.sample_rate 22050
--data.phoneme_type "text"
--data.dataset_type "text"
--data.phonemes_path uk_UA-ASMR/phonemes.json
--data.cache_dir uk_UA-ASMR/cache
--data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json
--data.batch_size 32
--data.num_workers 8
--model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt
--trainer.max_epochs 500
--trainer.check_val_every_n_epoch 1
--trainer.default_root_dir uk_UA-ASMR/output
**注意**:`--trainer.default_root_dir` 参数可确保PyTorch Lightning将日志与检查点清晰保存至`uk_UA-ASMR/output/lightning_logs/`目录。
**注意**:系统中的NVIDIA驱动版本过低(当前检测版本为12080),或NVIDIA GeForce RTX 5090显卡的CUDA算力`sm_120`与当前PyTorch安装版本不兼容:
- 检查命令:`python -c "import torch; print(torch.__version__); print(torch.cuda.get_arch_list()); print(torch.randn(1).cuda())"`
- 升级命令:`pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128`
**注意**:可通过以下命令检查CPU缓存进程:`find uk_UA-ASMR/cache -name "*.pt" | wc -l`
### 硬件配置
| GPU型号 | 显存容量 | 批次大小 | 工作进程数 | 处理速度 | 单轮训练时长 |
| ------------ | ----- | ---------- | ----------- | --------- | ---------- |
| **H100 NVL** | 93 GB | 128 | 16 | ~1.0 it/s | ~60s |
| **L40S** | 46 GB | 32 | 8 | ~2.6 it/s | ~90s |
| **RTX 3090** | 24 GB | 24 | 8 | ~2.0 it/s | ~120s |
| **RTX 3060** | 12 GB | 16 | 4 | ~1.5 it/s | ~180s |
| **A100** | 80 GB | 96 | 16 | ~0.9 it/s | ~70s |
#### 从最新检查点继续训练
bash
python3 -m piper.train fit
--data.voice_name "uk_asmr"
--data.csv_path uk_UA-ASMR/metadata.csv
--data.audio_dir uk_UA-ASMR/audio
--data.espeak_voice "uk"
--model.sample_rate 22050
--data.phoneme_type "text"
--data.dataset_type "text"
--data.phonemes_path uk_UA-ASMR/phonemes.json
--data.cache_dir uk_UA-ASMR/cache
--data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json
--data.batch_size 32
--model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt
--trainer.max_epochs 500
--trainer.check_val_every_n_epoch 1
--trainer.default_root_dir uk_UA-ASMR/output
--ckpt_path uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=35-step=14832.ckpt
(可通过`uk_UA-ASMR/output/lightning_logs/`文件夹查找具体的`.ckpt`文件名)
**注意**:可通过以下命令查找检查点文件:`find /workspace -name "*.ckpt" 2>/dev/null | head -5`
### 模型导出
bash
# 1. 从最优/最新检查点导出ONNX模型
python3 -m piper.train.export_onnx
--checkpoint uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=14-step=6180.ckpt
--output-file uk_UA-ASMR/output/uk_UA-asmr-medium.onnx
## 模型输出产物
训练并导出模型后,将获得以下文件:
| 文件路径 | 说明 |
| ----------------------------- | ------------------------ |
| `uk_UA-asmr-medium.onnx` | 用于推理的ONNX模型 |
| `uk_UA-asmr-medium.onnx.json` | 模型配置文件 |
## 结合Piper使用
bash
# 安装Piper
pip install piper-tts
# 生成语音
# (使用`echo`管道输入文本可避免原始文本的CLI解析错误)
echo "привіт, як справи?" | python3 -m piper
--model uk_UA-ASMR/output/uk_UA-asmr-medium.onnx
--output_file audio.wav
## 音素类型
本数据集采用`phoneme_type: "text"`配置,即直接使用原始乌克兰语文本,无需通过espeak-ng进行音素转换。模型使用基于乌克兰西里尔字母的字符音素映射表。
有效字符:
а б в г ґ д е є ж з и і ї й к л м н о п р с т у ф х ц ч ш щ ь ю я
外加标点符号:`空格 ! ' , - . : ; ? _ ^ $ — ` 以及变音符号。
元数据示例:
utt_4197.wav|про що ти хочеш мене попросити?
utt_4198.wav|запитала вона підозріло.
## 引用方式
若使用本数据集,请引用如下文献:
bibtex
@misc{uk_ua_asmr,
title={Ukrainian ASMR TTS Dataset},
author={Kontextox},
year={2026},
url={https://huggingface.co/datasets/kontextox/uk_UA-ASMR}
}
## 许可证
CC0 公共领域
## 致谢
- 基础乌克兰语模型:[OHF-Voice/voice-datasets](https://github.com/OHF-Voice/voice-datasets)
- 训练框架:[Piper](https://github.com/OHF-voice/piper1-gpl)
提供机构:
kontextox



