five

slprl/StressPresso

收藏
Hugging Face2025-11-11 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/slprl/StressPresso
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-nc-4.0 task_categories: - question-answering - automatic-speech-recognition - audio-classification - audio-text-to-text dataset_info: features: - name: transcription dtype: string - name: intonation dtype: string - name: description dtype: string - name: possible_answers sequence: string - name: label dtype: int64 - name: audio_lm_prompt dtype: string - name: audio struct: - name: array sequence: float64 - name: path dtype: string - name: sampling_rate dtype: int64 - name: stress_pattern struct: - name: binary sequence: int64 - name: indices sequence: int64 - name: words sequence: string - name: metadata struct: - name: audio_path dtype: string - name: gender dtype: string - name: speaker_id dtype: string - name: interpretation_id dtype: string - name: transcription_id dtype: string splits: - name: test num_bytes: 216570205 num_examples: 202 download_size: 135868258 dataset_size: 216570205 tags: - speech - stress - intonation - audio-reasoning configs: - config_name: default data_files: - split: test path: data/test-* pretty_name: StressPresso --- # StressPresso Evaluation Dataset This dataset is derived from the *Expresso* dataset as introduced in the paper **[EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis](https://arxiv.org/pdf/2308.05725)**. For additional information on *Expresso*, see its [project page](https://speechbot.github.io/expresso/). The *StressPresso* dataset supports the evaluation of models on **Sentence Stress Reasoning (SSR)** and **Sentence Stress Detection (SSD)** tasks, as introduced in our paper: **[StressTest: Can YOUR Speech LM Handle the Stress?](https://huggingface.co/papers/2505.22765)** 💻 [Code Repository](https://github.com/slp-rl/StressTest) | 🤗 [Model: StresSLM](https://huggingface.co/slprl/StresSLM) | 🤗 [Stress-17k Dataset](https://huggingface.co/datasets/slprl/Stress-17K-raw) 📃 [Paper](https://huggingface.co/papers/2505.22765) | 🌐 [Project Page](https://pages.cs.huji.ac.il/adiyoss-lab/stresstest/) --- ## 🗂️ Dataset Overview The *StressPresso* dataset includes **202** evaluation samples (split: `test`) with the following features: * `transcription_id`: Identifier for each transcription sample. * `transcription`: The spoken text. * `description`: Description of the interpretation of the stress pattern. * `intonation`: The stressed version of the transcription. * `interpretation_id`: Unique reference to the interpretation imposed by the stress pattern of the sentence. * `audio`: Audio data at 48kHz sampling rate. * `metadata`: Structured metadata including: * `gender`: Speaker gender. * `audio_path`: Expresso sample name. * `speaker_id`: Expresso speaker id. * `possible_answers`: List of possible interpretations for SSR. * `label`: Ground truth label for SSR. * `stress_pattern`: Structured stress annotation including: * `binary`: Sequence of 0/1 labels marking stressed words. * `indices`: Stressed word positions in the transcription. * `words`: The actual stressed words. * `audio_lm_prompt`: The prompt used for SSR. --- ## Evaluate YOUR model This dataset is designed for evaluating models following the protocol and scripts in our [StressTest repository](https://github.com/slp-rl/StressTest). To evaluate a model, refer to the instructions in the repository. For example: ```bash python -m stresstest.evaluation.main \ --task ssr \ --model_to_evaluate stresslm ``` Replace `ssr` with `ssd` for stress detection, and use your model’s name with `--model_to_evaluate`. --- ## How to use This dataset is formatted for usage with the HuggingFace Datasets library: ```python from datasets import load_dataset dataset = load_dataset("slprl/StressPresso") ``` --- ## 📖 Citation If you use this dataset in your work, please cite: ```bibtex @misc{yosha2025stresstest, title={StressTest: Can YOUR Speech LM Handle the Stress?}, author={Iddo Yosha and Gallil Maimon and Yossi Adi}, year={2025}, eprint={2505.22765}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.22765}, } ```
提供机构:
slprl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作