five

TTS-AGI/voice-taxonomy-pretrain

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/TTS-AGI/voice-taxonomy-pretrain
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - audio-classification tags: - voice - speech - taxonomy - whisper - tts - voice-attributes size_categories: - 100K<n<1M --- # Voice Taxonomy Pre-training Dataset **318,729 speech samples** annotated with **57 voice taxonomy dimensions** (0-6 ordinal scale) by a Whisper ensemble (4 models voting). Designed as pre-training data for voice attribute classifiers. ## Related Datasets | Dataset | Purpose | Link | |---------|---------|------| | **This dataset** | Pre-training (large, noisy labels) | — | | Fine-tuning (balanced, Gemini Flash) | Fine-tuning | [TTS-AGI/voice-taxonomy-flash-train](https://huggingface.co/datasets/TTS-AGI/voice-taxonomy-flash-train) | | Validation (Gemini 3.1 Pro gold) | Evaluation | [TTS-AGI/voice-taxonomy-val](https://huggingface.co/datasets/TTS-AGI/voice-taxonomy-val) | ## Format WebDataset TAR with MP3+JSON pairs: ``` {stem}.mp3 # Audio (mono, 44.1kHz, 64kbps, ≤30s) {stem}.json # 57-dim taxonomy annotation ``` Each JSON: ```json { "AGEV": {"value": 3, "name": "Perceived Age", "label": "young adult"}, "GEND": {"value": 5, "name": "Gender Presentation", "label": "standard masculine"}, ... } ``` ## Training Plan See [TRAINING_PLAN.md](TRAINING_PLAN.md) for the full training strategy (pre-train → fine-tune → evaluate) and `train_voice_taxonomy.py` for a self-contained training script. ## Quick Start ```bash # Download huggingface-cli download TTS-AGI/voice-taxonomy-pretrain --local-dir . # Pre-train python train_voice_taxonomy.py --phase pretrain --encoder laion/BUD-E-Whisper --gpu 0 ``` ## Taxonomy 57 dimensions covering: speaker identity, timbral quality, resonance placement, prosody, articulation, emotion, and speaking style. Each rated 0-6. See `taxonomy_labels.json` for full definitions. ## Labels Labels were generated by a **Whisper ensemble** (4 BUD-E-Whisper variants voting). These are noisier than the Gemini-annotated fine-tuning and validation sets, but the 10x larger dataset size compensates.
提供机构:
TTS-AGI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作