twinkle-ai/finevoices-zhtw
收藏Hugging Face2026-03-30 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/twinkle-ai/finevoices-zhtw
下载链接
链接失效反馈官方服务:
资源简介:
本計畫 finevoices-zhtw 旨在建立一套以繁體中文(台灣)為核心、可合法使用、可長期維護的語音資料集,作為繁體中文語言與語音模型在 語音辨識(ASR)、語音合成(TTS)、多模態模型訓練與微調(fine-tuning) 等任務上的公共基礎資料來源。本專案的整體精神,參考 Mozilla Common Voice 與 Hugging Face 社群所推動的開放資料集模式——由社群共同貢獻、清楚定義授權、長期累積並持續演進。然而,finevoices-zhtw 不直接複製既有資料或來源,而是聚焦於 繁體中文與台灣語境,由台灣使用者、研究者與組織自行貢獻語音資料,建立一個在地、可持續、可被信任使用於模型訓練的公共語音資源。我們的目標,是讓 finevoices-zhtw 成為繁體中文世界中,如同 Common Voice 之於英文語音社群般的重要公共基礎建設。
The finevoices-zhtw project aims to establish a legally usable and maintainable speech dataset centered on Traditional Chinese (Taiwan), serving as a public foundational data source for Traditional Chinese language and speech models in tasks such as speech recognition (ASR), speech synthesis (TTS), and multimodal model training and fine-tuning. The projects ethos is inspired by the open dataset models promoted by Mozilla Common Voice and the Hugging Face community—collaboratively contributed by the community, with clearly defined licenses, and continuously evolving over time. However, finevoices-zhtw does not directly replicate existing data or sources but focuses on Traditional Chinese and the Taiwanese context, with speech data contributed by Taiwanese users, researchers, and organizations to build a local, sustainable, and trustworthy public speech resource for model training. Our goal is for finevoices-zhtw to become a crucial public infrastructure in the Traditional Chinese world, akin to what Common Voice is for the English speech community.
提供机构:
twinkle-ai



