ajd12342/paraspeechcaps-situational-train
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ajd12342/paraspeechcaps-situational-train
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-sa-4.0
tags:
- speech
- audio
- style
- CLAP
- dual-encoder
- contrastive-learning
- situational
- emotion
- speaking-style
source_datasets:
- ajd12342/paraspeechcaps
task_categories:
- audio-classification
size_categories:
- 10K<n<100K
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: source
dtype: string
- name: relative_audio_path
dtype: string
- name: text_description
sequence: string
- name: transcription
dtype: string
- name: intrinsic_tags
sequence: string
- name: situational_tags
sequence: string
- name: basic_tags
sequence: string
- name: all_tags
sequence: string
- name: speakerid
dtype: string
- name: name
dtype: string
- name: duration
dtype: float64
- name: gender
dtype: string
- name: accent
dtype: string
- name: pitch
dtype: string
- name: speaking_rate
dtype: string
- name: noise
dtype: string
- name: utterance_pitch_mean
dtype: float64
- name: snr
dtype: float64
- name: phonemes
dtype: string
splits:
- name: train
num_bytes: 93959347
num_examples: 96195
download_size: 32365908
dataset_size: 93959347
---
# ParaSpeechCaps Situational Training Dataset
Training dataset for the **ParaSpeechCLAP-Situational** and **ParaSpeechCLAP-Combined** models, from the paper:
*ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining*
Anuj Diwan, Eunsol Choi, David Harwath
## Overview
This dataset contains the **situational-tag subset** of [ParaSpeechCaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps), filtered to include only examples annotated with **situational (utterance-level)** style tags. It is used to train the ParaSpeechCLAP-Situational and ParaSpeechCLAP-Combined models with a contrastive loss.
## Installation
Install the `datasets` package to load the dataset:
```bash
pip install datasets
```
To train ParaSpeechCLAP models using this dataset, install the [ParaSpeechCLAP GitHub repository](https://github.com/ajd12342/paraspeechclap):
```bash
git clone https://github.com/ajd12342/paraspeechclap.git
cd paraspeechclap
pip install -r requirements.txt
```
### Setting up audio files
The dataset contains a `relative_audio_path` column but not the audio files themselves. Resolving audio paths requires specifying `data.audio_root`, a common root directory organized as `${audio_root}/{source}/`, where `{source}` matches the value of the `source` column in the dataset.
This dataset uses the **Expresso** and **EARS** sources. Follow the [ParaSpeechCaps audio setup instructions](https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio) for those sources, with the following adjustment: instead of placing each source at its own root directory, place them under a common root:
- `${audio_root}/expresso/` (instead of `${expresso_root}`)
- `${audio_root}/ears/` (instead of `${ears_root}`)
Then pass `data.audio_root=${audio_root}` when running any ParaSpeechCLAP script.
## Usage with ParaSpeechCLAP
### Training
```bash
torchrun --nproc_per_node=4 scripts/train.py \
--config-name train/situational \
data.audio_root=/path/to/audio_root \
meta.results=./experiments
```
### Loading the dataset
```python
from datasets import load_dataset
dataset = load_dataset("ajd12342/paraspeechcaps-situational-train", split="train")
print(f"Number of examples: {len(dataset)}")
print(dataset[0])
```
## Related Resources
- **GitHub Repository:** [https://github.com/ajd12342/paraspeechclap](https://github.com/ajd12342/paraspeechclap)
- **Models:** [ajd12342/paraspeechclap-intrinsic](https://huggingface.co/ajd12342/paraspeechclap-intrinsic), [ajd12342/paraspeechclap-situational](https://huggingface.co/ajd12342/paraspeechclap-situational) and [ajd12342/paraspeechclap-combined](https://huggingface.co/ajd12342/paraspeechclap-combined)
- **Parent Dataset:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps)
- **Training Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train) and [https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train)
- **Evaluation Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic), [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational) and [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined)
## Citation
```bibtex
@inproceedings{diwan2026paraspeechclap,
title={{ParaSpeechCLAP}: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining},
author={Diwan, Anuj and Choi, Eunsol and Harwath, David},
journal={Under Review},
year={2026}
}
```
提供机构:
ajd12342



