five

meetween/dowis

收藏
Hugging Face2026-03-11 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/meetween/dowis
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - de - en - es - cs - fr - hu - it - nl - pt - ru - sq - sv tags: - speech prompts - text prompts - instruction following - benchmark size_categories: - 1K<n<10K dataset_info: features: - name: text_prompt dtype: string - name: audio_prompt_female_1 dtype: audio - name: audio_prompt_female_2 dtype: audio - name: audio_prompt_male_1 dtype: audio - name: audio_prompt_male_2 dtype: audio - name: language dtype: string - name: task dtype: string - name: prompt_type dtype: string splits: - name: test num_bytes: 2704378267.6 num_examples: 1320 download_size: 1772318018 dataset_size: 2704378267.6 configs: - config_name: default data_files: - split: test path: data/test-* --- # Do What I Say (DOWIS): A Spoken Prompt Dataset for Instruction-Following <span style="background-color:#fee2e2; color:#b91c1c; padding:2px 6px; border-radius:4px; font-size:0.85em; font-weight:600;">NEW</span> DOWIS now also contains spoken and written prompts in Albanian (sq), and for the tasks LIPREAD and SLU! > **TL;DR** — DOWIS is a multilingual dataset of human-recorded spoken and written instruction prompts, designed to enable realistic evaluation of Speech Large Language Models across 11 tasks and 12 languages. --- ## Dataset Summary Most Speech LLM benchmarks use text-based prompts, which does not reflect how users actually interact with these models in the real world. DOWIS fills this gap by providing human-recorded spoken prompts, paired with their written equivalents, across a wide range of tasks, languages, and prompt styles. Each prompt can be directly paired with any existing speech benchmark to evaluate how well Speech LLMs follow spoken instructions. The dataset contains **1,320 rows**, with up to 4 audio recordings per row (2 female, 2 male speakers where available), covering: - **12 languages**: cs, de, en, es, fr, hu, it, nl, pt, ru, sq, sv - **11 tasks**: ACHAP, ASR, MT, S2ST, SQA, SSUM, ST, TSUM, TTS, LIPREAD, SLU - **5 prompt styles**: basic, formal, informal, detailed, short - **10 prompt variants** per task-language pair Details can be found in the corresponding paper on [arXiv](https://arxiv.org/abs/2603.09881). Code for benchmarking Speech LLMs with different task benchmarks coupled with DOWIS can be found on [GitHub](https://github.com/MaikeZuefle/DOWIS/tree/main). --- ## Tasks | Task Code | Description | |-----------|-------------| | ACHAP | Audio Chaptering | | ASR | Automatic Speech Recognition | | MT | Machine Translation | | S2ST | Speech-to-Speech Translation | | SQA | Spoken Question Answering | | SSUM | Speech Summarization | | ST | Speech Translation | | TSUM | Text Summarization | | TTS | Text-to-Speech | | LIPREAD | Lip-Reading | | SLU | Spoken Language Understanding | ## Prompt Styles | Style | Description | |-------|-------------| | `basic` | Natural, everyday phrasing a researcher would use | | `formal` | Professional, polished language | | `informal` | Conversational and casual | | `detailed` | Explicit and precise instructions on how to perform the task | | `short` | Concise as possible while remaining unambiguous | --- ## Dataset Fields | Field | Type | Description | |-------|------|-------------| | `text_prompt` | `string` | Written version of the instruction prompt | | `audio_prompt_female_1` | `Audio` | Human-recorded female speaker (speaker 1), `null` if unavailable | | `audio_prompt_female_2` | `Audio` | Human-recorded female speaker (speaker 2), `null` if unavailable | | `audio_prompt_male_1` | `Audio` | Human-recorded male speaker (speaker 1), `null` if unavailable | | `audio_prompt_male_2` | `Audio` | Human-recorded male speaker (speaker 2), `null` if unavailable | | `language` | `string` | ISO 639-1 language code (e.g. `en`, `de`) | | `task` | `string` | Task code the prompt is designed for (e.g. `asr`, `mt`) | | `prompt_type` | `string` | Prompt style: `basic`, `formal`, `informal`, `detailed`, or `short` | --- ## Citation If you use this work, please cite: ```bibtex @misc{züfle2026isayspokenprompt, title={Do What I Say: A Spoken Prompt Dataset for Instruction-Following}, author={Maike Züfle and Sara Papi and Fabian Retkowski and Szymon Mazurek and Marek Kasztelnik and Alexander Waibel and Luisa Bentivogli and Jan Niehues}, year={2026}, eprint={2603.09881}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2603.09881}} ``` --- Dataset Contact: maike.zuefle@kit.edu
提供机构:
meetween
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作