five

kontextox/uk_UA-ASMR

收藏
Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/kontextox/uk_UA-ASMR
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - uk license: cc0-1.0 datasets: - kontextox/uk_UA-ASMR tags: - tts - text-to-speech - ukrainian - asmr - piper configs: - config_name: default data_files: "metadata.csv" sep: "|" column_names: ["file_name", "text"] --- # Ukrainian ASMR TTS Dataset A Ukrainian text-to-speech dataset for training single-speaker ASMR-style voice models using [Piper](https://github.com/OHF-voice/piper1-gpl). ## Dataset Details | Property | Value | | ------------ | -------------------------- | | Language | Ukrainian (uk_UA) | | Speakers | 1 | | Segments | 7,318 | | Audio Format | 16-bit WAV, 22050 Hz, Mono | | License | CC0 | ## Dataset Structure ### Prerequisites ```bash # Install Piper training dependencies git clone https://github.com/kontextox/piper1-gpl.git cd piper1-gpl python3 -m venv .venv source .venv/bin/activate python3 -m pip install -e '.[train]' ./build_monotonic_align.sh # If use `OHF-voice/piper1-gpl` (fixed in `kontextox/piper1-gpl`): # pip install scikit-build python3 setup.py build_ext --inplace # CRITICAL FIX for custom text phonemes in Piper `OHF-voice/piper1-gpl` (fixed in `kontextox/piper1-gpl`): # This patches dataset.py to properly use the custom phoneme map loaded via --data.phonemes_path # sed -i 's/phonemes_to_ids(sentence_phonemes)/phonemes_to_ids(sentence_phonemes, id_map=self.piper_config.phoneme_id_map)/g' src/piper/train/vits/dataset.py ``` ```bash # 1. Download the dataset hf download kontextox/uk_UA-ASMR \ --repo-type dataset --local-dir uk_UA-ASMR tar -xzf uk_UA-ASMR/clear_audio.tar.gz -C uk_UA-ASMR/audio # 2. Download the base checkpoint AND its configuration hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt \ --repo-type dataset --local-dir uk_UA-ASMR/checkpoints hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/config.json \ --repo-type dataset --local-dir uk_UA-ASMR/checkpoints # 3. Extract the exact phoneme map from the base config to use for training python3 -c "import json; d=json.load(open('uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/config.json')); json.dump(d['phoneme_id_map'], open('uk_UA-ASMR/phonemes.json','w'), ensure_ascii=False, indent=2)" ``` ```text uk_UA-ASMR/ ├── README.md ├── metadata.csv # Metadata ├── phonemes.json # Automatically extracted Ukrainian phoneme map ├── audio/ # Audio files (22050 Hz, mono, 16-bit) │ ├── utt_0001.wav │ ├── utt_0002.wav │ └── ... └── checkpoints/uk/uk_UA/ukrainian_tts/medium/ ├── config.json └── epoch=2090-step=1166778.ckpt ``` ## Audio Specifications - **Sample Rate**: 22050 Hz - **Channels**: Mono - **Bit Depth**: 16-bit - **Format**: WAV ## Training ### Training Command ```bash python3 -m piper.train fit \ --data.voice_name "uk_asmr" \ --data.csv_path uk_UA-ASMR/metadata.csv \ --data.audio_dir uk_UA-ASMR/audio \ --data.espeak_voice "uk" \ --model.sample_rate 22050 \ --data.phoneme_type "text" \ --data.dataset_type "text" \ --data.phonemes_path uk_UA-ASMR/phonemes.json \ --data.cache_dir uk_UA-ASMR/cache \ --data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json \ --data.batch_size 32 \ --data.num_workers 8 \ --model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt \ --trainer.max_epochs 500 \ --trainer.check_val_every_n_epoch 1 \ --trainer.default_root_dir uk_UA-ASMR/output ``` _**Note**: `--trainer.default_root_dir` ensures PyTorch Lightning saves logs and checkpoints cleanly to `uk_UA-ASMR/output/lightning_logs/`_ _**Note**: The NVIDIA driver on your system is too old (found version 12080) or NVIDIA GeForce RTX 5090 with CUDA capability `sm_120` is not compatible with the current PyTorch installation:_ - Check: `python -c "import torch; print(torch.__version__); print(torch.cuda.get_arch_list()); print(torch.randn(1).cuda())"` - Run: `pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128` _**Note**: Check CPU process `find uk_UA-ASMR/cache -name "*.pt" | wc -l`_ ### Hardware Configuration | GPU | VRAM | Batch Size | Num Workers | Speed | Epoch Time | | ------------ | ----- | ---------- | ----------- | --------- | ---------- | | **H100 NVL** | 93 GB | 128 | 16 | ~1.0 it/s | ~60s | | **L40S** | 46 GB | 32 | 8 | ~2.6 it/s | ~90s | | **RTX 3090** | 24 GB | 24 | 8 | ~2.0 it/s | ~120s | | **RTX 3060** | 12 GB | 16 | 4 | ~1.5 it/s | ~180s | | **A100** | 80 GB | 96 | 16 | ~0.9 it/s | ~70s | #### Continue from latest checkpoint ```bash python3 -m piper.train fit \ --data.voice_name "uk_asmr" \ --data.csv_path uk_UA-ASMR/metadata.csv \ --data.audio_dir uk_UA-ASMR/audio \ --data.espeak_voice "uk" \ --model.sample_rate 22050 \ --data.phoneme_type "text" \ --data.dataset_type "text" \ --data.phonemes_path uk_UA-ASMR/phonemes.json \ --data.cache_dir uk_UA-ASMR/cache \ --data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json \ --data.batch_size 32 \ --model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt \ --trainer.max_epochs 500 \ --trainer.check_val_every_n_epoch 1 \ --trainer.default_root_dir uk_UA-ASMR/output \ --ckpt_path uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=35-step=14832.ckpt ``` _(Check your `uk_UA-ASMR/output/lightning_logs/` folder for the exact `.ckpt` filename)_ _**Note**: Find checkpoints `find /workspace -name "*.ckpt" 2>/dev/null | head -5`_ ### Exporting ```bash # 1. Export the ONNX model from your best/latest checkpoint python3 -m piper.train.export_onnx \ --checkpoint uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=14-step=6180.ckpt \ --output-file uk_UA-ASMR/output/uk_UA-asmr-medium.onnx ``` ## Model Output After training and export, you will have: | File | Description | | ----------------------------- | ------------------------ | | `uk_UA-asmr-medium.onnx` | ONNX model for inference | | `uk_UA-asmr-medium.onnx.json` | Model configuration | ## Usage with Piper ```bash # Install piper pip install piper-tts # Generate speech # (Pipe the text using 'echo' to avoid CLI parsing errors with raw text modes) echo "привіт, як справи?" | python3 -m piper \ --model uk_UA-ASMR/output/uk_UA-asmr-medium.onnx \ --output_file audio.wav ``` ## Phoneme Type This dataset uses `phoneme_type: "text"`, meaning raw Ukrainian characters are used directly without espeak-ng phonemization. The model uses a character-based phoneme map with Ukrainian Cyrillic characters. Valid characters: ``` а б в г ґ д е є ж з и і ї й к л м н о п р с т у ф х ц ч ш щ ь ю я ``` Plus punctuation: `space ! ' , - . : ; ? _ ^ $ — ` + diacritics Metadata: ``` utt_4197.wav|про що ти хочеш мене попросити? utt_4198.wav|запитала вона підозріло. ``` ## Citation If you use this dataset, please cite: ```bibtex @misc{uk_ua_asmr, title={Ukrainian ASMR TTS Dataset}, author={Kontextox}, year={2026}, url={https://huggingface.co/datasets/kontextox/uk_UA-ASMR} } ``` ## License CC0 - Public Domain ## Acknowledgments - Base Ukrainian model: [OHF-Voice/voice-datasets](https://github.com/OHF-Voice/voice-datasets) - Training framework: [Piper](https://github.com/OHF-voice/piper1-gpl)

语言:乌克兰语(uk) 许可证:CC0-1.0 关联数据集:kontextox/uk_UA-ASMR 标签:文本转语音(text-to-speech, TTS)、乌克兰语、自发性知觉经络反应(Autonomous Sensory Meridian Response, ASMR)、Piper 配置项: - 配置名称:default 数据文件:metadata.csv 分隔符:| 列名:["file_name", "text"] # 乌克兰语ASMR文本转语音数据集 一款用于训练单发言人自发性知觉经络反应(Autonomous Sensory Meridian Response, ASMR)风格语音模型的乌克兰语文本转语音数据集,采用[Piper](https://github.com/OHF-voice/piper1-gpl)框架实现。 ## 数据集详情 | 属性 | 数值说明 | | ------------ | -------------------------- | | 语言 | 乌克兰语(uk_UA) | | 发言人数量 | 1 | | 语音片段数 | 7318段 | | 音频格式 | 16位WAV、22050Hz、单声道 | | 许可证 | CC0 | ## 数据集结构 ### 前置依赖 bash # 安装Piper训练依赖项 git clone https://github.com/kontextox/piper1-gpl.git cd piper1-gpl python3 -m venv .venv source .venv/bin/activate python3 -m pip install -e '.[train]' ./build_monotonic_align.sh # 若使用`OHF-voice/piper1-gpl`(`kontextox/piper1-gpl`中已修复该问题): # pip install scikit-build python3 setup.py build_ext --inplace # Piper `OHF-voice/piper1-gpl`自定义文本音素的关键修复(`kontextox/piper1-gpl`中已修复): # 该补丁用于修改dataset.py,使其可正确使用通过--data.phonemes_path加载的自定义音素映射表 # sed -i 's/phonemes_to_ids(sentence_phonemes)/phonemes_to_ids(sentence_phonemes, id_map=self.piper_config.phoneme_id_map)/g' src/piper/train/vits/dataset.py bash # 1. 下载数据集 hf download kontextox/uk_UA-ASMR --repo-type dataset --local-dir uk_UA-ASMR tar -xzf uk_UA-ASMR/clear_audio.tar.gz -C uk_UA-ASMR/audio # 2. 下载基础检查点及其配置文件 hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt --repo-type dataset --local-dir uk_UA-ASMR/checkpoints hf download rhasspy/piper-checkpoints uk/uk_UA/ukrainian_tts/medium/config.json --repo-type dataset --local-dir uk_UA-ASMR/checkpoints # 3. 从基础配置文件中提取精确音素映射表,用于训练 python3 -c "import json; d=json.load(open('uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/config.json')); json.dump(d['phoneme_id_map'], open('uk_UA-ASMR/phonemes.json','w'), ensure_ascii=False, indent=2)" text uk_UA-ASMR/ ├── README.md ├── metadata.csv # 元数据文件 ├── phonemes.json # 自动提取的乌克兰语音素映射表 ├── audio/ # 音频文件目录(22050Hz、单声道、16位) │ ├── utt_0001.wav │ ├── utt_0002.wav │ └── ... └── checkpoints/uk/uk_UA/ukrainian_tts/medium/ ├── config.json └── epoch=2090-step=1166778.ckpt ## 音频规格 - **采样率**:22050Hz - **声道数**:单声道 - **位深度**:16位 - **格式**:WAV ## 训练流程 ### 训练命令 bash python3 -m piper.train fit --data.voice_name "uk_asmr" --data.csv_path uk_UA-ASMR/metadata.csv --data.audio_dir uk_UA-ASMR/audio --data.espeak_voice "uk" --model.sample_rate 22050 --data.phoneme_type "text" --data.dataset_type "text" --data.phonemes_path uk_UA-ASMR/phonemes.json --data.cache_dir uk_UA-ASMR/cache --data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json --data.batch_size 32 --data.num_workers 8 --model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt --trainer.max_epochs 500 --trainer.check_val_every_n_epoch 1 --trainer.default_root_dir uk_UA-ASMR/output **注意**:`--trainer.default_root_dir` 参数可确保PyTorch Lightning将日志与检查点清晰保存至`uk_UA-ASMR/output/lightning_logs/`目录。 **注意**:系统中的NVIDIA驱动版本过低(当前检测版本为12080),或NVIDIA GeForce RTX 5090显卡的CUDA算力`sm_120`与当前PyTorch安装版本不兼容: - 检查命令:`python -c "import torch; print(torch.__version__); print(torch.cuda.get_arch_list()); print(torch.randn(1).cuda())"` - 升级命令:`pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128` **注意**:可通过以下命令检查CPU缓存进程:`find uk_UA-ASMR/cache -name "*.pt" | wc -l` ### 硬件配置 | GPU型号 | 显存容量 | 批次大小 | 工作进程数 | 处理速度 | 单轮训练时长 | | ------------ | ----- | ---------- | ----------- | --------- | ---------- | | **H100 NVL** | 93 GB | 128 | 16 | ~1.0 it/s | ~60s | | **L40S** | 46 GB | 32 | 8 | ~2.6 it/s | ~90s | | **RTX 3090** | 24 GB | 24 | 8 | ~2.0 it/s | ~120s | | **RTX 3060** | 12 GB | 16 | 4 | ~1.5 it/s | ~180s | | **A100** | 80 GB | 96 | 16 | ~0.9 it/s | ~70s | #### 从最新检查点继续训练 bash python3 -m piper.train fit --data.voice_name "uk_asmr" --data.csv_path uk_UA-ASMR/metadata.csv --data.audio_dir uk_UA-ASMR/audio --data.espeak_voice "uk" --model.sample_rate 22050 --data.phoneme_type "text" --data.dataset_type "text" --data.phonemes_path uk_UA-ASMR/phonemes.json --data.cache_dir uk_UA-ASMR/cache --data.config_path uk_UA-ASMR/output/uk_UA-asmr-medium.onnx.json --data.batch_size 32 --model.vocoder_warmstart_ckpt uk_UA-ASMR/checkpoints/uk/uk_UA/ukrainian_tts/medium/epoch=2090-step=1166778.ckpt --trainer.max_epochs 500 --trainer.check_val_every_n_epoch 1 --trainer.default_root_dir uk_UA-ASMR/output --ckpt_path uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=35-step=14832.ckpt (可通过`uk_UA-ASMR/output/lightning_logs/`文件夹查找具体的`.ckpt`文件名) **注意**:可通过以下命令查找检查点文件:`find /workspace -name "*.ckpt" 2>/dev/null | head -5` ### 模型导出 bash # 1. 从最优/最新检查点导出ONNX模型 python3 -m piper.train.export_onnx --checkpoint uk_UA-ASMR/output/lightning_logs/version_0/checkpoints/epoch=14-step=6180.ckpt --output-file uk_UA-ASMR/output/uk_UA-asmr-medium.onnx ## 模型输出产物 训练并导出模型后,将获得以下文件: | 文件路径 | 说明 | | ----------------------------- | ------------------------ | | `uk_UA-asmr-medium.onnx` | 用于推理的ONNX模型 | | `uk_UA-asmr-medium.onnx.json` | 模型配置文件 | ## 结合Piper使用 bash # 安装Piper pip install piper-tts # 生成语音 # (使用`echo`管道输入文本可避免原始文本的CLI解析错误) echo "привіт, як справи?" | python3 -m piper --model uk_UA-ASMR/output/uk_UA-asmr-medium.onnx --output_file audio.wav ## 音素类型 本数据集采用`phoneme_type: "text"`配置,即直接使用原始乌克兰语文本,无需通过espeak-ng进行音素转换。模型使用基于乌克兰西里尔字母的字符音素映射表。 有效字符: а б в г ґ д е є ж з и і ї й к л м н о п р с т у ф х ц ч ш щ ь ю я 外加标点符号:`空格 ! ' , - . : ; ? _ ^ $ — ` 以及变音符号。 元数据示例: utt_4197.wav|про що ти хочеш мене попросити? utt_4198.wav|запитала вона підозріло. ## 引用方式 若使用本数据集,请引用如下文献: bibtex @misc{uk_ua_asmr, title={Ukrainian ASMR TTS Dataset}, author={Kontextox}, year={2026}, url={https://huggingface.co/datasets/kontextox/uk_UA-ASMR} } ## 许可证 CC0 公共领域 ## 致谢 - 基础乌克兰语模型:[OHF-Voice/voice-datasets](https://github.com/OHF-Voice/voice-datasets) - 训练框架:[Piper](https://github.com/OHF-voice/piper1-gpl)
提供机构:
kontextox
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作