ajd12342/paraspeechclap-eval-intrinsic

Name: ajd12342/paraspeechclap-eval-intrinsic
Creator: ajd12342
Published: 2026-03-27 17:42:49
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ajd12342/paraspeechclap-eval-intrinsic

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-nc-sa-4.0 tags: - speech - audio - style - CLAP - dual-encoder - evaluation - benchmark - intrinsic - speaker-level - classification source_datasets: - ajd12342/paraspeechcaps task_categories: - audio-classification size_categories: - 1K<n<10K dataset_info: features: - name: source dtype: string - name: relative_audio_path dtype: string - name: text_description dtype: string - name: classification_tag dtype: string - name: transcription dtype: string - name: all_tags sequence: string - name: speakerid dtype: string - name: name dtype: string - name: duration dtype: float64 - name: gender dtype: string - name: accent dtype: string - name: pitch dtype: string - name: speaking_rate dtype: string - name: noise dtype: string - name: utterance_pitch_mean dtype: float64 - name: snr dtype: float64 - name: phonemes dtype: string splits: - name: test num_bytes: 1821680 num_examples: 2819 - name: classification_clarity num_bytes: 482928 num_examples: 875 - name: classification_pitch num_bytes: 811642 num_examples: 1438 - name: classification_rhythm num_bytes: 1117149 num_examples: 1956 - name: classification_texture num_bytes: 598245 num_examples: 1093 - name: classification_volume num_bytes: 693780 num_examples: 1216 download_size: 1924265 dataset_size: 5525424 configs: - config_name: default data_files: - split: test path: data/test-* - split: classification_clarity path: data/classification_clarity-* - split: classification_pitch path: data/classification_pitch-* - split: classification_rhythm path: data/classification_rhythm-* - split: classification_texture path: data/classification_texture-* - split: classification_volume path: data/classification_volume-* --- # Intrinsic Evaluation Dataset for ParaSpeechCLAP Models Evaluation dataset for **intrinsic (speaker-level)** style attributes, from the paper: *ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining* Anuj Diwan, Eunsol Choi, David Harwath ## Overview This dataset is the **intrinsic evaluation dataset** for the ParaSpeechCLAP model family. It contains speech clips paired with intrinsic-tag style captions, drawn from the **VoxCeleb** portion of the [ParaSpeechCaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) holdout set. The captions are generated from solely the intrinsic and basic tags. The dataset has **6 splits**: `test` for **retrieval** evaluation, and `classification_clarity`, `classification_pitch`, `classification_rhythm`, `classification_texture`, `classification_volume` for per-attribute **classification** evaluation. The key column for retrieval is `text_description` and the key column for classification is `classification_tag`. Classification is evaluated per-attribute using the template *"A person is speaking in a {label} style"*: ## Installation Install the `datasets` package to load the dataset: ```bash pip install datasets ``` To run retrieval and classification evaluation with ParaSpeechCLAP, install the [ParaSpeechCLAP GitHub repository](https://github.com/ajd12342/paraspeechclap): ```bash git clone https://github.com/ajd12342/paraspeechclap.git cd paraspeechclap pip install -r requirements.txt ``` ### Setting up audio files The dataset contains a `relative_audio_path` column but not the audio files themselves. Resolving audio paths requires specifying `data.audio_root`, a common root directory organized as `${audio_root}/{source}/`, where `{source}` matches the value of the `source` column in the dataset. This dataset uses the **VoxCeleb** source. Follow the [ParaSpeechCaps audio setup instructions](https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio) for VoxCeleb, with the following adjustment: instead of placing VoxCeleb at its own `${voxceleb_root}`, place it under a common root: - `${audio_root}/voxceleb/` (instead of `${voxceleb_root}`) Then pass `data.audio_root=${audio_root}` when running any ParaSpeechCLAP script. ## Usage with ParaSpeechCLAP ### Retrieval evaluation ```bash python scripts/evaluate_retrieval.py \ --config-name eval/retrieval \ checkpoint_path=./checkpoints/paraspeechclap-intrinsic.pth.tar \ data.dataset_name=ajd12342/paraspeechclap-eval-intrinsic \ data.audio_root=/path/to/audio_root \ meta.results=./results_retrieval/paraspeechclap-eval-intrinsic/ajd12342-paraspeechclap-intrinsic ``` ### Classification evaluation ```bash # Evaluate all attributes for attr in clarity pitch rhythm texture volume; do python scripts/evaluate_classification.py \ --config-name eval/classification/${attr} \ checkpoint_path=./checkpoints/paraspeechclap-intrinsic.pth.tar \ data.audio_root=/path/to/audio_root \ meta.results=./results_classification/paraspeechclap-eval-intrinsic/ajd12342-paraspeechclap-intrinsic/${attr} done ``` ### Loading the dataset ```python from datasets import load_dataset # Retrieval split retrieval = load_dataset("ajd12342/paraspeechclap-eval-intrinsic", split="test") print(f"Retrieval clips: {len(retrieval)}") print(f"Unique prompts: {len(set(retrieval['text_description']))}") # Classification split (e.g., pitch) cls_pitch = load_dataset("ajd12342/paraspeechclap-eval-intrinsic", split="classification_pitch") print(f"Classification pitch clips: {len(cls_pitch)}") print(f"Labels: {set(cls_pitch['classification_tag'])}") ``` ## Related Resources - **GitHub Repository:** [https://github.com/ajd12342/paraspeechclap](https://github.com/ajd12342/paraspeechclap) - **Models:** [ajd12342/paraspeechclap-intrinsic](https://huggingface.co/ajd12342/paraspeechclap-intrinsic), [ajd12342/paraspeechclap-combined](https://huggingface.co/ajd12342/paraspeechclap-combined), and [ajd12342/paraspeechclap-situational](https://huggingface.co/ajd12342/paraspeechclap-situational) - **Parent Dataset:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) - **Training Dataset:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train) - **Evaluation Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined) and [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational) ## Citation ```bibtex @inproceedings{diwan2026paraspeechclap, title={{ParaSpeechCLAP}: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining}, author={Diwan, Anuj and Choi, Eunsol and Harwath, David}, journal={Under Review}, year={2026} } ```

提供机构：

ajd12342

5,000+

优质数据集

54 个

任务类型

进入经典数据集