LaFresCat: a Catalan multi-accent speech dataset for text-to-speech

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/records/14588400

下载链接

链接失效反馈

官方服务：

资源简介：

LaFresCat Multiaccent We present LaFresCat, the first Catalan multiaccented and multispeaker dataset. This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Commercial use is only possible through licensing by the voice artists. For further information, contact langtech@bsc.es and lafrescaproduccions@gmail.com. Dataset Details Dataset Description The audios from this dataset have been created with professional studio recordings by professional voice actors in Lafresca Creative Studio. This is the raw version of the dataset, no resampling or trimming has been applied to the audios. Audios are stored in wav format at 48khz sampling rate In total, there are 4 different accents, with 2 speakers per accent (female and male). After trimming, accumulates a total of 3,75h (divided by speaker IDs) as follows: Balear olga -> 23.5 min quim -> 30.93 min Central elia -> 33.14 min grau -> 37,86 min Occidental (North-Western) emma -> 28,67 min pere -> 25,12 min Valencia gina -> 22,25 min lluc -> 23,58 min Uses The purpose of this dataset is mainly for training text-to-speech and automatic speech recognition models in Catalan accents. Languages The dataset is in Catalan (ca-ES). Dataset Structure The dataset consists of 2858 audios and transcriptions in the following structure:lafresca_multiaccent_raw├── balear│ ├── olga│ ├── olga.txt│ ├── quim│ └── quim.txt├── central│ ├── elia│ ├── elia.txt│ ├── grau│ └── grau.txt├── full_filelist.txt├── occidental│ ├── emma│ ├── emma.txt│ ├── pere│ └── pere.txt└── valencia ├── gina ├── gina.txt ├── lluc └── lluc.txt Metadata of the dataset can be found in the file `full_filelist.txt` , each line represents an audio and follows the format: audio_path | speaker_id | transcription The speaker ids have the following mapping: "quim": 0,"olga": 1,"grau": 2,"elia": 3,"pere": 4,"emma": 5,"lluc": 6,"gina": 7 Dataset Creation This dataset has been created by members of the Language Technologies unit from the Life Sciences department of the Barcelona Supercomputing Center, except the valencian sentences which were created with the support of Cenid, the Digital Intelligence Center of the University of Alicante. The voices belong to professional voice actors and they've been recorded in Lafresca Creative Studio. Source Data The data presented in this dataset is the source data. Data Collection and Processing These are the technical details of the data collection and processing: Microphone: Austrian Audio oc818 Preamp: Focusrite ISA Two Audio Interface: Antelope Orion 32+ DAW: ProTools 2023.6.0 Processing: Noise Gate: C1 Gate Compression BF-76 De-Esser Renaissance EQ Maag EQ2 EQ FabFilter Pro-Q3 Limiter: L1 Ultramaximizer Here's the information about the speakers: Dialect Gender County Central male Barcelonès Central female Barcelonès Balear female Pla de Mallorca Balear male Llevant Occidental male Baix Ebre Occidental female Baix Ebre Valencian female Ribera Alta Valencian male La Plana Baixa Who are the source data producers? The Language Technologies team from the Life Sciences department at the Barcelona Supercomputing Center developed this dataset. It features recordings by professional voice actors made at Lafresca Creative Studio. Annotations In order to check whether or not there were any errors in the transcriptions of the audios, we created a Label Studio space. In that space, we manually listened to subset of the dataset, and compared what we heard with the transcription. If the transcription was mistaken, we corrected it. Personal and Sensitive Information The dataset consists of professional voice actors who have recorded their voice. You agree to not attempt to determine the identity of speakers in this dataset. Bias, Risks, and Limitations Training a Text-to-Speech (TTS) model by fine-tuning with a Catalan speaker who speaks a particular dialect presents significant limitations. Mostly, the challenge is in capturing the full range of variability inherent in that accent. Each dialect has its own unique phonetic, intonational, and prosodic characteristics that can vary greatly even within a single linguistic region. Consequently, a TTS model trained on a narrow dialect sample will struggle to generalize across different accents and sub-dialects, leading to reduced accuracy and naturalness. Additionally, achieving a standard representation is exceedingly difficult because linguistic features can differ markedly not only between dialects but also among individual speakers within the same dialect group. These variations encompass subtle nuances in pronunciation, rhythm, and speech patterns that are challenging to standardize in a model trained on a limited dataset. Funding This work has been promoted and financed by the Generalitat de Catalunya through the Aina project, in addition the Valencian sentences have been created within the framework of the NEL-VIVES project 2022/TL22/00215334. Dataset Card Contact langtech@bsc.es

创建时间：

2025-02-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集