five

ajd12342/paraspeechclap-eval-situational

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ajd12342/paraspeechclap-eval-situational
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-nc-sa-4.0 tags: - speech - audio - style - CLAP - dual-encoder - evaluation - benchmark - situational - emotion - speaking-style source_datasets: - ajd12342/paraspeechcaps task_categories: - audio-classification size_categories: - 1K<n<10K dataset_info: features: - name: source dtype: string - name: relative_audio_path dtype: string - name: text_description dtype: string - name: classification_tag dtype: string - name: transcription dtype: string - name: situational_tags dtype: string - name: basic_tags sequence: string - name: all_tags sequence: string - name: speakerid dtype: string - name: name dtype: string - name: duration dtype: float64 - name: gender dtype: string - name: accent dtype: string - name: pitch dtype: string - name: speaking_rate dtype: string - name: noise dtype: string - name: utterance_pitch_mean dtype: float32 - name: snr dtype: float64 - name: phonemes dtype: string splits: - name: test num_bytes: 1122633 num_examples: 1432 download_size: 261562 dataset_size: 1122633 configs: - config_name: default data_files: - split: test path: data/test-* --- # Situational Evaluation Dataset for ParaSpeechCLAP Models Evaluation dataset for **situational (utterance-level)** style attributes, used in the paper: *ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining* Anuj Diwan, Eunsol Choi, David Harwath ## Overview This dataset is the **situational evaluation dataset** for the ParaSpeechCLAP model family. It contains speech clips paired with situational-tag-only style captions, drawn from the **Expresso-EARS** portion of the [ParaSpeechCaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) holdout set. It is used to evaluate both **retrieval** and **classification** evaluation. The captions are generated from solely the situational tags. ## Installation Install the `datasets` package to load the dataset: ```bash pip install datasets ``` To run retrieval and classification evaluation with ParaSpeechCLAP, install the [ParaSpeechCLAP GitHub repository](https://github.com/ajd12342/paraspeechclap): ```bash git clone https://github.com/ajd12342/paraspeechclap.git cd paraspeechclap pip install -r requirements.txt ``` ### Setting up audio files The dataset contains a `relative_audio_path` column but not the audio files themselves. Resolving audio paths requires specifying `data.audio_root`, a common root directory organized as `${audio_root}/{source}/`, where `{source}` matches the value of the `source` column in the dataset. This dataset uses the **Expresso** and **EARS** sources. Follow the [ParaSpeechCaps audio setup instructions](https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio) for those sources, with the following adjustment: instead of placing each source at its own root directory, place them under a common root: - `${audio_root}/expresso/` (instead of `${expresso_root}`) - `${audio_root}/ears/` (instead of `${ears_root}`) Then pass `data.audio_root=${audio_root}` when running any ParaSpeechCLAP script. ## Usage with ParaSpeechCLAP ### Retrieval evaluation ```bash python scripts/evaluate_retrieval.py \ --config-name eval/retrieval \ checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar \ data.dataset_name=ajd12342/paraspeechclap-eval-situational \ data.audio_root=/path/to/audio_root \ meta.results=./results_retrieval/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational ``` ### Classification evaluation ```bash python scripts/evaluate_classification.py \ --config-name eval/classification/situational \ checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar \ data.audio_root=/path/to/audio_root \ meta.results=./results_classification/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational/ ``` ### Loading the dataset ```python from datasets import load_dataset dataset = load_dataset("ajd12342/paraspeechclap-eval-situational", split="test") print(f"Number of clips: {len(dataset)}") print(f"Number of unique prompts: {len(set(dataset['text_description']))}") print(dataset[0]) ``` ## Related Resources - **GitHub Repository:** [https://github.com/ajd12342/paraspeechclap](https://github.com/ajd12342/paraspeechclap) - **Models:** [ajd12342/paraspeechclap-intrinsic](https://huggingface.co/ajd12342/paraspeechclap-intrinsic), [ajd12342/paraspeechclap-situational](https://huggingface.co/ajd12342/paraspeechclap-situational) and [ajd12342/paraspeechclap-combined](https://huggingface.co/ajd12342/paraspeechclap-combined) - **Parent Dataset:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps](https://huggingface.co/datasets/ajd12342/paraspeechcaps) - **Training Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train) and [https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train](https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train) - **Evaluation Datasets:** [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic), [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational) and [https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined](https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined) ## Citation ```bibtex @inproceedings{diwan2026paraspeechclap, title={{ParaSpeechCLAP}: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining}, author={Diwan, Anuj and Choi, Eunsol and Harwath, David}, journal={Under Review}, year={2026} } ```

language: - 英语 license: CC BY-NC-SA 4.0(知识共享署名-非商业性使用-相同方式共享4.0协议) tags: - 语音 - 音频 - 风格 - CLAP - 双编码器(dual-encoder) - 评估 - 基准测试 - 情境化 - 情感 - 说话风格 source_datasets: - ajd12342/paraspeechcaps task_categories: - 音频分类(audio-classification) size_categories: - 1000 < 样本量 < 10000 dataset_info: 特征: - 字段名: source 数据类型: 字符串 - 字段名: relative_audio_path 数据类型: 字符串 - 字段名: text_description 数据类型: 字符串 - 字段名: classification_tag 数据类型: 字符串 - 字段名: transcription 数据类型: 字符串 - 字段名: situational_tags 数据类型: 字符串 - 字段名: basic_tags 数据类型: 字符串序列 - 字段名: all_tags 数据类型: 字符串序列 - 字段名: speakerid 数据类型: 字符串 - 字段名: name 数据类型: 字符串 - 字段名: duration 数据类型: float64 - 字段名: gender 数据类型: 字符串 - 字段名: accent 数据类型: 字符串 - 字段名: pitch 数据类型: 字符串 - 字段名: speaking_rate 数据类型: 字符串 - 字段名: noise 数据类型: 字符串 - 字段名: utterance_pitch_mean 数据类型: float32 - 字段名: snr 数据类型: float64(信噪比,Signal-to-Noise Ratio) - 字段名: phonemes 数据类型: 字符串 拆分: - 拆分名称: test 字节大小: 1122633 样本数量: 1432 下载大小: 261562 数据集总大小: 1122633 配置: - 配置名称: default 数据文件: - 拆分: test 路径: data/test-* --- # ParaSpeechCLAP模型情境化评估数据集 本数据集为面向**情境化(语句级)**风格属性的评估数据集,用于以下论文: *ParaSpeechCLAP:面向富风格化语言-音频预训练的双编码器语音-文本模型* Anuj Diwan、Eunsol Choi、David Harwath ## 数据集概述 本数据集为ParaSpeechCLAP模型系列的专属情境化评估数据集。其收录的语音片段均与仅基于情境标签的风格描述文本配对,数据源自[ParaSpeechCaps]("https://huggingface.co/datasets/ajd12342/paraspeechcaps")预留测试集的**Expresso-EARS**子集。本数据集可同时用于**检索任务**与**分类任务**的评估,描述文本仅由情境标签生成。 ## 安装方法 请先安装`datasets`库以加载本数据集: bash pip install datasets 若需使用ParaSpeechCLAP运行检索与分类评估任务,请克隆并安装[ParaSpeechCLAP的GitHub仓库]("https://github.com/ajd12342/paraspeechclap"): bash git clone https://github.com/ajd12342/paraspeechclap.git cd paraspeechclap pip install -r requirements.txt ### 音频文件配置 本数据集包含`relative_audio_path`列,但未内置音频文件。需通过指定`data.audio_root`参数完成音频路径解析,该参数指向统一根目录,目录结构为`${audio_root}/{source}/`,其中`{source}`需与数据集中`source`列的值保持一致。 本数据集使用**Expresso**与**EARS**两类数据源,请参照[ParaSpeechCaps音频配置指南]("https://github.com/ajd12342/paraspeechcaps/tree/main/dataset#22-processing-dataset-audio")进行配置,仅需调整如下:无需将每个数据源置于独立根目录,而是将其统一放置于该根目录下: - `${audio_root}/expresso/`(替代原`${expresso_root}`) - `${audio_root}/ears/`(替代原`${ears_root}`) 运行所有ParaSpeechCLAP相关脚本时,请传入`data.audio_root=${audio_root}`参数。 ## ParaSpeechCLAP使用示例 ### 检索任务评估 bash python scripts/evaluate_retrieval.py --config-name eval/retrieval checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar data.dataset_name=ajd12342/paraspeechclap-eval-situational data.audio_root=/path/to/audio_root meta.results=./results_retrieval/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational ### 分类任务评估 bash python scripts/evaluate_classification.py --config-name eval/classification/situational checkpoint_path=./checkpoints/paraspeechclap-situational.pth.tar data.audio_root=/path/to/audio_root meta.results=./results_classification/paraspeechclap-eval-situational/ajd12342-paraspeechclap-situational/ ### 数据集加载方式 python from datasets import load_dataset dataset = load_dataset("ajd12342/paraspeechclap-eval-situational", split="test") print(f"语音片段总数:{len(dataset)}") print(f"唯一描述文本数量:{len(set(dataset['text_description']))}") print(dataset[0]) ## 相关资源 - **GitHub仓库**:[https://github.com/ajd12342/paraspeechclap]("https://github.com/ajd12342/paraspeechclap") - **模型权重**:[ajd12342/paraspeechclap-intrinsic]("https://huggingface.co/ajd12342/paraspeechclap-intrinsic")、[ajd12342/paraspeechclap-situational]("https://huggingface.co/ajd12342/paraspeechclap-situational")与[ajd12342/paraspeechclap-combined]("https://huggingface.co/ajd12342/paraspeechclap-combined") - **父级数据集**:[https://huggingface.co/datasets/ajd12342/paraspeechcaps]("https://huggingface.co/datasets/ajd12342/paraspeechcaps") - **训练数据集**:[ajd12342/paraspeechcaps-intrinsic-train]("https://huggingface.co/datasets/ajd12342/paraspeechcaps-intrinsic-train")与[ajd12342/paraspeechcaps-situational-train]("https://huggingface.co/datasets/ajd12342/paraspeechcaps-situational-train") - **评估数据集**:[ajd12342/paraspeechclap-eval-intrinsic]("https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-intrinsic")、[ajd12342/paraspeechclap-eval-situational]("https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-situational")与[ajd12342/paraspeechclap-eval-combined]("https://huggingface.co/datasets/ajd12342/paraspeechclap-eval-combined") ## 引用 bibtex @inproceedings{diwan2026paraspeechclap, title={{ParaSpeechCLAP}: 面向富风格化语言-音频预训练的双编码器语音-文本模型}, author={Diwan, Anuj and Choi, Eunsol and Harwath, David}, journal={Under Review}, year={2026} }
提供机构:
ajd12342
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作