TaigiSpeech/TaigiSpeech
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TaigiSpeech/TaigiSpeech
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- nan
license: cc-by-4.0
task_categories:
- audio-classification
tags:
- spoken-language-understanding
- intent-classification
- taiwanese
- taigi
pretty_name: TaigiSpeech
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: audio
dtype: audio
- name: speaker_id
dtype: string
- name: intent
dtype:
class_label:
names:
'0': BREATHING_CHEST_EMERG
'1': CALL_CONTACT
'2': CANCEL_ALERT
'3': FALL_HELP
'4': LIGHT_OFF
'5': LIGHT_ON
'6': PAIN_GENERAL
'7': SOS_CALL
configs:
- config_name: default
data_files:
- split: train
path: data/train/**
- split: val
path: data/val/**
- split: test
path: data/test/**
---
# TaigiSpeech
A spoken language understanding (SLU) dataset for Taiwanese (台語/Taigi) intent classification, designed for elder-care and smart-home voice command scenarios.
**Paper**: [TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild](https://arxiv.org/abs/2603.21478)
## Dataset Description
TaigiSpeech contains 3,000+ Taiwanese speech utterances from 21 speakers, each labeled with one of 8 intent classes. The dataset is designed to support research in spoken language understanding for Taiwanese, a low-resource language.
### Supported Tasks
- **Intent Classification**: Classify spoken Taiwanese commands into 8 intent categories.
### Languages
- Taiwanese Taigi (Taiwanese Hokkien / Southern Min)
## Dataset Structure
### Splits
| Split | Samples | Speakers | Notes |
|-------|---------|----------|-------|
| Train | 1,600 | 10 | 200 per intent (balanced) |
| Val | 519 | 5 | ~64–67 per intent |
| Test | 960 | 6 | 120 per intent (balanced) |
Speakers are **disjoint** across splits (no speaker overlap).
### Intent Classes
| Intent | Description |
|--------|-------------|
| `SOS_CALL` | Emergency help request |
| `FALL_HELP` | Fall-related assistance |
| `BREATHING_CHEST_EMERG` | Breathing or chest emergency |
| `PAIN_GENERAL` | General pain report |
| `CALL_CONTACT` | Call a contact person |
| `LIGHT_ON` | Turn on lights |
| `LIGHT_OFF` | Turn off lights |
| `CANCEL_ALERT` | Cancel an alert |
### Data Fields
Each sample in `metadata.jsonl` contains:
- `file_name` (str): Relative path to the audio file (resolved as `audio` column by HF).
- `speaker_id` (str): Anonymized speaker identifier (e.g., `p001`).
- `intent` (str): One of 8 intent labels.
### Audio Specifications
- **Format**: WAV
- **Sample Rate**: 48 kHz
- **Channels**: Mono
### Directory Layout
```
TaigiSpeech/
├── README.md
├── metadata/
│ ├── p001_profile.json
│ └── ...
└── data/
├── train/
│ ├── metadata.jsonl
│ └── audio/
├── val/
│ ├── metadata.jsonl
│ └── audio/
└── test/
├── metadata.jsonl
└── audio/
```
### Speaker Profiles
The `metadata/` directory contains anonymized speaker profiles with demographic information: age, gender, education level, hometown, native language(s), and language fluency ratings.
## Speaker Demographics
- **Number of Speakers**: 21 (p001–p022, excluding p018)
- **Age Range**: 20–78 years (majority 54+)
- **Gender**: Mixed male and female
- **Regions**: Keelung, Taipei, New Taipei, Yilan, Yunlin, Taichung, Tainan, Chiayi
- **Recording Devices**: iPad, MacBook, external USB microphones
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("TaigiSpeech/TaigiSpeech")
```
## Citation
If you use this dataset, please cite:
```bibtex
@article{chang2026taigispeech,
title = {TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset
and Preliminary Results with Scalable Data Mining In-the-Wild},
author = {Chang, Kai-Wei and Lin, Yi-Cheng and Chou, Huang-Cheng and
Ren, Wenze and Huang, Yu-Han and Tsai, Yun-Shao and
Chen, Chien-Cheng and Tsao, Yu and Liao, Yuan-Fu and
Narayanan, Shrikanth and Glass, James and Lee, Hung-yi},
journal = {arXiv preprint arXiv:2603.21478},
year = {2026}
}
```
## License
This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
提供机构:
TaigiSpeech



