ajd12342/paraspeechclap-eval-situational
收藏Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ajd12342/paraspeechclap-eval-situational
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-sa-4.0
tags:
- speech
- audio
- style
- CLAP
- dual-encoder
- evaluation
- benchmark
- situational
- emotion
- speaking-style
source_datasets:
- ajd12342/paraspeechcaps
task_categories:
- audio-classification
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: source
dtype: string
- name: relative_audio_path
dtype: string
- name: text_description
dtype: string
- name: classification_tag
dtype: string
- name: transcription
dtype: string
- name: situational_tags
dtype: string
- name: basic_tags
sequence: string
- name: all_tags
sequence: string
- name: speakerid
dtype: string
- name: name
dtype: string
- name: duration
dtype: float64
- name: gender
dtype: string
- name: accent
dtype: string
- name: pitch
dtype: string
- name: speaking_rate
dtype: string
- name: noise
dtype: string
- name: utterance_pitch_mean
dtype: float32
- name: snr
dtype: float64
- name: phonemes
dtype: string
splits:
- name: test
num_bytes: 1122633
num_examples: 1432
download_size: 261562
dataset_size: 1122633
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
---
# Situational Evaluation Dataset for ParaSpeechCLAP Models
Evaluation dataset for **situational (utterance-level)** style attributes, used in the paper:
*ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining*
Anuj Diwan, Eunsol Choi, David Harwath
## Overview
This dataset is the **situational evaluation dataset** for the ParaSpeechCLAP model family. It contains speech clips paired with situational-tag-only style captions, drawn from the **Expresso-EARS** portion of the [ParaSpeechCaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) holdout set. It is used to evaluate both **retrieval** and **classification** evaluation. The captions are generated from solely the situational tags.
## Installation
Install the `datasets` package to load the dataset:
```bash
pip install datasets
```
To run retrieval and classification evaluation with ParaSpeechCLAP, install the [ParaSpeechCLAP GitHub repository](https://github.com/ajd12342/paraspeechclap):
```bash
git clone https://github.com/ajd12342/paraspeechclap.git
cd paraspeechclap
pip install -r requirements.txt
```
### Setting up audio files
The dataset contains a `relative_audio_path` column but not the audio files themselves. Resolving audio paths requires specifying `data.audio_root`, a common root directory organized as `${audio_root}/{source}/`, where `{source}` matches the value of the `source` column in the dataset.
This dataset uses the **Expresso** and **EARS** sources. Follow the [ParaSpeechCaps audio setup instructions](https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio) for those sources, with the following adjustment: instead of placing each source at its own root directory, place them under a common root:
- `${audio_root}/expresso/` (instead of `${expresso_root}`)
- `${audio_root}/ears/` (instead of `${ears_root}`)
Then pass `data.audio_root=${audio_root}` when running any ParaSpeechCLAP script.
## Usage with ParaSpeechCLAP
### Retrieval evaluation
```bash
python scripts/evaluate_retrieval.py \
--config-name eval/retrieval \
checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar \
data.dataset_name=ajd12342/paraspeechclap-eval-situational \
data.audio_root=/path/to/audio_root \
meta.results=./results_retrieval/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational
```
### Classification evaluation
```bash
python scripts/evaluate_classification.py \
--config-name eval/classification/situational \
checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar \
data.audio_root=/path/to/audio_root \
meta.results=./results_classification/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational/
```
### Loading the dataset
```python
from datasets import load_dataset
dataset = load_dataset("ajd12342/paraspeechclap-eval-situational", split="test")
print(f"Number of clips: {len(dataset)}")
print(f"Number of unique prompts: {len(set(dataset['text_description']))}")
print(dataset[0])
```
## Related Resources
- **GitHub Repository:** [https://github.com/ajd12342/paraspeechclap](https://github.com/ajd12342/paraspeechclap)
- **Models:** [ajd12342/paraspeechclap-intrinsic](https://huggingface.co/ajd12342/paraspeechclap-intrinsic), [ajd12342/paraspeechclap-situational](https://huggingface.co/ajd12342/paraspeechclap-situational) and [ajd12342/paraspeechclap-combined](https://huggingface.co/ajd12342/paraspeechclap-combined)
- **Parent Dataset:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps)
- **Training Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train) and [https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train)
- **Evaluation Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic), [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational) and [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined)
## Citation
```bibtex
@inproceedings{diwan2026paraspeechclap,
title={{ParaSpeechCLAP}: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining},
author={Diwan, Anuj and Choi, Eunsol and Harwath, David},
journal={Under Review},
year={2026}
}
```
language:
- 英语
license: CC BY-NC-SA 4.0(知识共享署名-非商业性使用-相同方式共享4.0协议)
tags:
- 语音
- 音频
- 风格
- CLAP
- 双编码器(dual-encoder)
- 评估
- 基准测试
- 情境化
- 情感
- 说话风格
source_datasets:
- ajd12342/paraspeechcaps
task_categories:
- 音频分类(audio-classification)
size_categories:
- 1000 < 样本量 < 10000
dataset_info:
特征:
- 字段名: source
数据类型: 字符串
- 字段名: relative_audio_path
数据类型: 字符串
- 字段名: text_description
数据类型: 字符串
- 字段名: classification_tag
数据类型: 字符串
- 字段名: transcription
数据类型: 字符串
- 字段名: situational_tags
数据类型: 字符串
- 字段名: basic_tags
数据类型: 字符串序列
- 字段名: all_tags
数据类型: 字符串序列
- 字段名: speakerid
数据类型: 字符串
- 字段名: name
数据类型: 字符串
- 字段名: duration
数据类型: float64
- 字段名: gender
数据类型: 字符串
- 字段名: accent
数据类型: 字符串
- 字段名: pitch
数据类型: 字符串
- 字段名: speaking_rate
数据类型: 字符串
- 字段名: noise
数据类型: 字符串
- 字段名: utterance_pitch_mean
数据类型: float32
- 字段名: snr
数据类型: float64(信噪比,Signal-to-Noise Ratio)
- 字段名: phonemes
数据类型: 字符串
拆分:
- 拆分名称: test
字节大小: 1122633
样本数量: 1432
下载大小: 261562
数据集总大小: 1122633
配置:
- 配置名称: default
数据文件:
- 拆分: test
路径: data/test-*
---
# ParaSpeechCLAP模型情境化评估数据集
本数据集为面向**情境化(语句级)**风格属性的评估数据集,用于以下论文:
*ParaSpeechCLAP:面向富风格化语言-音频预训练的双编码器语音-文本模型*
Anuj Diwan、Eunsol Choi、David Harwath
## 数据集概述
本数据集为ParaSpeechCLAP模型系列的专属情境化评估数据集。其收录的语音片段均与仅基于情境标签的风格描述文本配对,数据源自[ParaSpeechCaps]("https://huggingface.co/datasets/ajd12342/paraspeechcaps")预留测试集的**Expresso-EARS**子集。本数据集可同时用于**检索任务**与**分类任务**的评估,描述文本仅由情境标签生成。
## 安装方法
请先安装`datasets`库以加载本数据集:
bash
pip install datasets
若需使用ParaSpeechCLAP运行检索与分类评估任务,请克隆并安装[ParaSpeechCLAP的GitHub仓库]("https://github.com/ajd12342/paraspeechclap"):
bash
git clone https://github.com/ajd12342/paraspeechclap.git
cd paraspeechclap
pip install -r requirements.txt
### 音频文件配置
本数据集包含`relative_audio_path`列,但未内置音频文件。需通过指定`data.audio_root`参数完成音频路径解析,该参数指向统一根目录,目录结构为`${audio_root}/{source}/`,其中`{source}`需与数据集中`source`列的值保持一致。
本数据集使用**Expresso**与**EARS**两类数据源,请参照[ParaSpeechCaps音频配置指南]("https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio")进行配置,仅需调整如下:无需将每个数据源置于独立根目录,而是将其统一放置于该根目录下:
- `${audio_root}/expresso/`(替代原`${expresso_root}`)
- `${audio_root}/ears/`(替代原`${ears_root}`)
运行所有ParaSpeechCLAP相关脚本时,请传入`data.audio_root=${audio_root}`参数。
## ParaSpeechCLAP使用示例
### 检索任务评估
bash
python scripts/evaluate_retrieval.py
--config-name eval/retrieval
checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar
data.dataset_name=ajd12342/paraspeechclap-eval-situational
data.audio_root=/path/to/audio_root
meta.results=./results_retrieval/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational
### 分类任务评估
bash
python scripts/evaluate_classification.py
--config-name eval/classification/situational
checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar
data.audio_root=/path/to/audio_root
meta.results=./results_classification/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational/
### 数据集加载方式
python
from datasets import load_dataset
dataset = load_dataset("ajd12342/paraspeechclap-eval-situational", split="test")
print(f"语音片段总数:{len(dataset)}")
print(f"唯一描述文本数量:{len(set(dataset['text_description']))}")
print(dataset[0])
## 相关资源
- **GitHub仓库**:[https://github.com/ajd12342/paraspeechclap]("https://github.com/ajd12342/paraspeechclap")
- **模型权重**:[ajd12342/paraspeechclap-intrinsic]("https://huggingface.co/ajd12342/paraspeechclap-intrinsic")、[ajd12342/paraspeechclap-situational]("https://huggingface.co/ajd12342/paraspeechclap-situational")与[ajd12342/paraspeechclap-combined]("https://huggingface.co/ajd12342/paraspeechclap-combined")
- **父级数据集**:[https://huggingface.co/datasets/ajd12342/paraspeechcaps]("https://huggingface.co/datasets/ajd12342/paraspeechcaps")
- **训练数据集**:[ajd12342/paraspeechcaps-intrinsic-train]("https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train")与[ajd12342/paraspeechcaps-situational-train]("https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train")
- **评估数据集**:[ajd12342/paraspeechclap-eval-intrinsic]("https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic")、[ajd12342/paraspeechclap-eval-situational]("https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational")与[ajd12342/paraspeechclap-eval-combined]("https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined")
## 引用
bibtex
@inproceedings{diwan2026paraspeechclap,
title={{ParaSpeechCLAP}: 面向富风格化语言-音频预训练的双编码器语音-文本模型},
author={Diwan, Anuj and Choi, Eunsol and Harwath, David},
journal={Under Review},
year={2026}
}
提供机构:
ajd12342



