sapinsapin/filipinospeechcorpus
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sapinsapin/filipinospeechcorpus
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- fil
task_categories:
- automatic-speech-recognition
tags:
- filipino
- tagalog
- speech
- tts
- whisper
citation: >-
@article{sagumdevelopment, title={DEVELOPMENT OF A FILIPINO SPEECH CORPUS},
author={Sagum, Ramil} }
license: mit
pretty_name: FSC
size_categories:
- 100K<n<1M
---
# Filipino Speech Corpus
Word and sentence-level segments from the Filipino Speech Corpus (FSC), stored as a Hugging Face Parquet dataset with raw 16kHz mono audio. For more detailed information about the data, checkout the full paper [Development of a Filipino Speech Corpus](http://www.wins.or.kr/DataPool/Board/xxxx/18xx/1812/DEVELOPMENT%20OF%20A%20FILIPINO%20SPEECH%20CORPUS.pdf).
The corpus is composed of single-word utterances, read speech, and spontaneous speech from 100 speakers aged 16 and above.
## Citation
If you use this dataset, please cite the original corpus:
```
@article{sagumdevelopment,
title={DEVELOPMENT OF A FILIPINO SPEECH CORPUS},
author={Sagum, Ramil}
}
```
## Usage
**Whisper / ASR:**
```python
from datasets import load_dataset, Audio
ds = load_dataset("sapinsapin/filipinospeechcorpus")
ds = ds.filter(lambda x: x["num_words"] >= 3 and x["duration"] >= 1.5)
ds = ds.cast_column("audio", Audio(sampling_rate=16000))
```
**TTS (LJSpeech-compatible, 22050Hz):**
```python
ds = ds.filter(lambda x: x["speech_type"] == "read" and 1.0 <= x["duration"] <= 10.0)
ds = ds.cast_column("audio", Audio(sampling_rate=22050))
```
## Schema
| Field | Type | Description |
|---|---|---|
| `audio` | `Audio(16000)` | 16kHz mono WAV segment |
| `sentence` | `str` | Transcription |
| `duration` | `float` | Segment duration in seconds |
| `num_words` | `int` | Word count |
| `speaker_id` | `str` | Speaker identifier |
| `gender` | `str` | `male` / `female` |
| `age_group` | `str` | Age range e.g. `20-27` |
| `speech_type` | `str` | `read` / `spontaneous` / `machine` |
| `source_file` | `str` | Original TRS stem |
## File naming convention
The names of the sound files have the following information:
speaker identification number, speaker gender, speaker age group and text material set.
For example, the file name, 12-0-4-2.wav, refers to
12 --speaker id no.
0--gender
0-male
1-female
4-age group
0:20-27
1:28-35
2:36-43
3:44-51
4:52-60
2-text material number
## Code
Processing code is available at [https://github.com/sapinsapin/halohalo](https://github.com/sapinsapin/halohalo)
## Splits
| Split | Rows |
|---|---|
| `train` | 90% |
| `test` | 10% |
提供机构:
sapinsapin



