ajd12342/paraspeechcaps-situational-train

Name: ajd12342/paraspeechcaps-situational-train
Creator: ajd12342
Published: 2026-03-27 17:44:54
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ajd12342/paraspeechcaps-situational-train

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-nc-sa-4.0 tags: - speech - audio - style - CLAP - dual-encoder - contrastive-learning - situational - emotion - speaking-style source_datasets: - ajd12342/paraspeechcaps task_categories: - audio-classification size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: source dtype: string - name: relative_audio_path dtype: string - name: text_description sequence: string - name: transcription dtype: string - name: intrinsic_tags sequence: string - name: situational_tags sequence: string - name: basic_tags sequence: string - name: all_tags sequence: string - name: speakerid dtype: string - name: name dtype: string - name: duration dtype: float64 - name: gender dtype: string - name: accent dtype: string - name: pitch dtype: string - name: speaking_rate dtype: string - name: noise dtype: string - name: utterance_pitch_mean dtype: float64 - name: snr dtype: float64 - name: phonemes dtype: string splits: - name: train num_bytes: 93959347 num_examples: 96195 download_size: 32365908 dataset_size: 93959347 --- # ParaSpeechCaps Situational Training Dataset Training dataset for the **ParaSpeechCLAP-Situational** and **ParaSpeechCLAP-Combined** models, from the paper: *ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining* Anuj Diwan, Eunsol Choi, David Harwath ## Overview This dataset contains the **situational-tag subset** of [ParaSpeechCaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps), filtered to include only examples annotated with **situational (utterance-level)** style tags. It is used to train the ParaSpeechCLAP-Situational and ParaSpeechCLAP-Combined models with a contrastive loss. ## Installation Install the `datasets` package to load the dataset: ```bash pip install datasets ``` To train ParaSpeechCLAP models using this dataset, install the [ParaSpeechCLAP GitHub repository](https://github.com/ajd12342/paraspeechclap): ```bash git clone https://github.com/ajd12342/paraspeechclap.git cd paraspeechclap pip install -r requirements.txt ``` ### Setting up audio files The dataset contains a `relative_audio_path` column but not the audio files themselves. Resolving audio paths requires specifying `data.audio_root`, a common root directory organized as `${audio_root}/{source}/`, where `{source}` matches the value of the `source` column in the dataset. This dataset uses the **Expresso** and **EARS** sources. Follow the [ParaSpeechCaps audio setup instructions](https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio) for those sources, with the following adjustment: instead of placing each source at its own root directory, place them under a common root: - `${audio_root}/expresso/` (instead of `${expresso_root}`) - `${audio_root}/ears/` (instead of `${ears_root}`) Then pass `data.audio_root=${audio_root}` when running any ParaSpeechCLAP script. ## Usage with ParaSpeechCLAP ### Training ```bash torchrun --nproc_per_node=4 scripts/train.py \ --config-name train/situational \ data.audio_root=/path/to/audio_root \ meta.results=./experiments ``` ### Loading the dataset ```python from datasets import load_dataset dataset = load_dataset("ajd12342/paraspeechcaps-situational-train", split="train") print(f"Number of examples: {len(dataset)}") print(dataset[0]) ``` ## Related Resources - **GitHub Repository:** [https://github.com/ajd12342/paraspeechclap](https://github.com/ajd12342/paraspeechclap) - **Models:** [ajd12342/paraspeechclap-intrinsic](https://huggingface.co/ajd12342/paraspeechclap-intrinsic), [ajd12342/paraspeechclap-situational](https://huggingface.co/ajd12342/paraspeechclap-situational) and [ajd12342/paraspeechclap-combined](https://huggingface.co/ajd12342/paraspeechclap-combined) - **Parent Dataset:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) - **Training Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train) and [https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train) - **Evaluation Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic), [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational) and [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined) ## Citation ```bibtex @inproceedings{diwan2026paraspeechclap, title={{ParaSpeechCLAP}: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining}, author={Diwan, Anuj and Choi, Eunsol and Harwath, David}, journal={Under Review}, year={2026} } ```

提供机构：

ajd12342

5,000+

优质数据集

54 个

任务类型

进入经典数据集