aigc-x/Pronunciation-boldvoice

Name: aigc-x/Pronunciation-boldvoice
Creator: aigc-x
Published: 2026-04-07 11:00:45
License: 暂无描述

Hugging Face2026-04-07 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/aigc-x/Pronunciation-boldvoice

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - audio-classification language: - en size_categories: - 10K<n<100K tags: - pronunciation-assessment - phoneme - speech dataset_info: features: - name: audio dtype: audio: sampling_rate: 16000 - name: reference_text dtype: string - name: response dtype: string - name: source dtype: string - name: duration dtype: float64 - name: score dtype: int64 splits: - name: train num_bytes: 36605701980 num_examples: 43182 download_size: 36476677163 dataset_size: 36605701980 configs: - config_name: default data_files: - split: train path: data/train-* --- # Pronunciation Assessment Dataset (BoldVoice + speechocean762) Dataset for fine-tuning multimodal models on English pronunciation assessment. ## Overview | Source | Samples | Audio Duration | Description | |--------|---------|---------------|-------------| | BoldVoice | 38,182 | 10-20s | Non-native English learners, BoldVoice API annotations | | speechocean762 | 5,000 | 1.6-20s | Public dataset, 5-expert scored, Mandarin speakers | | **Total** | **43,182** | | | ## Schema | Column | Type | Description | |--------|------|-------------| | `audio` | Audio (16kHz mono) | Speech recording | | `reference_text` | string | Text the speaker intended to read | | `response` | string | JSON annotation (see below) | | `source` | string | `boldvoice` or `speechocean762` | | `duration` | float | Audio duration in seconds | | `score` | int | Overall pronunciation score (0-100) | ## Annotation Format (response JSON) ```json { "words": [ { "word": "bear", "expected": ["B", "EH", "R"], "actual": ["B", "AH", "R"], "is_correct": false, "errors": [{"index": 1, "expected": "EH", "actual": "AH", "type": "substitution"}] } ], "summary": { "total_phonemes": 3, "correct_phonemes": 2, "error_count": 1, "score": 67 } } ``` - Phonemes in **ARPAbet** notation (no stress markers) - Error types: `substitution`, `deletion`, `insertion`, `mispronounced` ## Fine-tuning ```bash pip install -r requirements.txt # Fine-tune Gemma 4 E2B-it with LoRA python finetune_gemma4_e2b.py --model google/gemma-4-E2B-it # Custom settings python finetune_gemma4_e2b.py --model /path/to/local/model --lr 1e-4 --epochs 2 --batch-size 2 ``` ## Token Budget | Metric | Value | |--------|-------| | Median tokens/sample | 1,197 | | p95 tokens/sample | 2,674 | | Max tokens/sample | 6,143 |

提供机构：

aigc-x

5,000+

优质数据集

54 个

任务类型

进入经典数据集