five

sapinsapin/filipinospeechcorpus

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sapinsapin/filipinospeechcorpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - fil task_categories: - automatic-speech-recognition tags: - filipino - tagalog - speech - tts - whisper citation: >- @article{sagumdevelopment, title={DEVELOPMENT OF A FILIPINO SPEECH CORPUS}, author={Sagum, Ramil} } license: mit pretty_name: FSC size_categories: - 100K<n<1M --- # Filipino Speech Corpus Word and sentence-level segments from the Filipino Speech Corpus (FSC), stored as a Hugging Face Parquet dataset with raw 16kHz mono audio. For more detailed information about the data, checkout the full paper [Development of a Filipino Speech Corpus](http://www.wins.or.kr/DataPool/Board/xxxx/18xx/1812/DEVELOPMENT%20OF%20A%20FILIPINO%20SPEECH%20CORPUS.pdf). The corpus is composed of single-word utterances, read speech, and spontaneous speech from 100 speakers aged 16 and above. ## Citation If you use this dataset, please cite the original corpus: ``` @article{sagumdevelopment, title={DEVELOPMENT OF A FILIPINO SPEECH CORPUS}, author={Sagum, Ramil} } ``` ## Usage **Whisper / ASR:** ```python from datasets import load_dataset, Audio ds = load_dataset("sapinsapin/filipinospeechcorpus") ds = ds.filter(lambda x: x["num_words"] >= 3 and x["duration"] >= 1.5) ds = ds.cast_column("audio", Audio(sampling_rate=16000)) ``` **TTS (LJSpeech-compatible, 22050Hz):** ```python ds = ds.filter(lambda x: x["speech_type"] == "read" and 1.0 <= x["duration"] <= 10.0) ds = ds.cast_column("audio", Audio(sampling_rate=22050)) ``` ## Schema | Field | Type | Description | |---|---|---| | `audio` | `Audio(16000)` | 16kHz mono WAV segment | | `sentence` | `str` | Transcription | | `duration` | `float` | Segment duration in seconds | | `num_words` | `int` | Word count | | `speaker_id` | `str` | Speaker identifier | | `gender` | `str` | `male` / `female` | | `age_group` | `str` | Age range e.g. `20-27` | | `speech_type` | `str` | `read` / `spontaneous` / `machine` | | `source_file` | `str` | Original TRS stem | ## File naming convention The names of the sound files have the following information: speaker identification number, speaker gender, speaker age group and text material set. For example, the file name, 12-0-4-2.wav, refers to 12 --speaker id no. 0--gender 0-male 1-female 4-age group 0:20-27 1:28-35 2:36-43 3:44-51 4:52-60 2-text material number ## Code Processing code is available at [https://github.com/sapinsapin/halohalo](https://github.com/sapinsapin/halohalo) ## Splits | Split | Rows | |---|---| | `train` | 90% | | `test` | 10% |
提供机构:
sapinsapin
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作