ajd12342/paraspeechclap-eval-combined

Name: ajd12342/paraspeechclap-eval-combined
Creator: ajd12342
Published: 2026-03-27 17:42:51
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ajd12342/paraspeechclap-eval-combined

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-nc-sa-4.0 tags: - speech - audio - style - CLAP - dual-encoder - evaluation - benchmark - compositional - intrinsic - situational source_datasets: - ajd12342/paraspeechcaps task_categories: - audio-classification size_categories: - 1K<n<10K dataset_info: features: - name: source dtype: string - name: relative_audio_path dtype: string - name: text_description dtype: string - name: transcription dtype: string - name: intrinsic_tags sequence: string - name: situational_tags dtype: string - name: basic_tags sequence: string - name: all_tags sequence: string - name: speakerid dtype: string - name: name dtype: string - name: duration dtype: float64 - name: gender dtype: string - name: accent dtype: string - name: pitch dtype: string - name: speaking_rate dtype: string - name: noise dtype: string - name: utterance_pitch_mean dtype: float32 - name: snr dtype: float64 - name: phonemes dtype: string splits: - name: test num_bytes: 1326439 num_examples: 1432 download_size: 352784 dataset_size: 1326439 configs: - config_name: default data_files: - split: test path: data/test-* --- # Combined Evaluation Dataset for ParaSpeechCLAP Models Evaluation dataset for **combined (compositional)** style attributes, used in the paper: *ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining* Anuj Diwan, Eunsol Choi, David Harwath ## Overview This dataset is the **combined evaluation set** for the ParaSpeechCLAP model family. It contains speech clips paired with **compositional style captions that include both intrinsic and situational tags**, drawn from the **Expresso-EARS** portion of the [ParaSpeechCaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) holdout set. It is used to perform **retrieval** evaluation. This version uses the **original captions** containing both intrinsic and situational tags from the [ParaSpeechCaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) holdout set. ## Installation Install the `datasets` package to load the dataset: ```bash pip install datasets ``` To run retrieval evaluation with ParaSpeechCLAP, install the [ParaSpeechCLAP GitHub repository](https://github.com/ajd12342/paraspeechclap): ```bash git clone https://github.com/ajd12342/paraspeechclap.git cd paraspeechclap pip install -r requirements.txt ``` ### Setting up audio files The dataset contains a `relative_audio_path` column but not the audio files themselves. Resolving audio paths requires specifying `data.audio_root`, a common root directory organized as `${audio_root}/{source}/`, where `{source}` matches the value of the `source` column in the dataset. This dataset uses the **Expresso** and **EARS** sources. Follow the [ParaSpeechCaps audio setup instructions](https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio) for those sources, with the following adjustment: instead of placing each source at its own root directory, place them under a common root: - `${audio_root}/expresso/` (instead of `${expresso_root}`) - `${audio_root}/ears/` (instead of `${ears_root}`) Then pass `data.audio_root=${audio_root}` when running any ParaSpeechCLAP script. ## Usage with ParaSpeechCLAP ### Retrieval evaluation ```bash python scripts/evaluate_retrieval.py \ --config-name eval/retrieval \ checkpoint_path=./checkpoints/paraspeechclap-combined.pth.tar \ data.dataset_name=ajd12342/paraspeechclap-eval-combined \ data.audio_root=/path/to/audio_root \ meta.results=./results_retrieval/paraspeechclap-eval-combined/ajd12342-paraspeechclap-combined ``` ### Loading the dataset ```python from datasets import load_dataset dataset = load_dataset("ajd12342/paraspeechclap-eval-combined", split="test") print(f"Number of clips: {len(dataset)}") print(f"Number of unique prompts: {len(set(dataset['text_description']))}") print(dataset[0]) ``` ## Related Resources - **GitHub Repository:** [https://github.com/ajd12342/paraspeechclap](https://github.com/ajd12342/paraspeechclap) - **Models:** [ajd12342/paraspeechclap-intrinsic](https://huggingface.co/ajd12342/paraspeechclap-intrinsic), [ajd12342/paraspeechclap-situational](https://huggingface.co/ajd12342/paraspeechclap-situational) and [ajd12342/paraspeechclap-combined](https://huggingface.co/ajd12342/paraspeechclap-combined) - **Parent Dataset:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) - **Training Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train) and [https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train) - **Evaluation Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic), [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational) and [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined) ## Citation ```bibtex @inproceedings{diwan2026paraspeechclap, title={{ParaSpeechCLAP}: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining}, author={Diwan, Anuj and Choi, Eunsol and Harwath, David}, journal={Under Review}, year={2026} } ```

提供机构：

ajd12342

5,000+

优质数据集

54 个

任务类型

进入经典数据集