Speech/text alignments for Italian end-to-end TTS
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13899342
下载链接
链接失效反馈官方服务:
资源简介:
Here are 146 chapters of several audiobooks from Librivox (33:22:19.176) read by 33 italian speakers:
19 Females#LC (Lisa Caputo): 18 chapters - 8643 utterances> 07:51:25.991#EG (Enrica Giampieretti): 16 chapters - 8473 utterances> 07:04:47.044#MT (Mariateresa): 7 chapters - 2735 utterances> 02:57:0.487#MR (Mariarosa): 2 chapters - 1150 utterances> 00:55:59.825#FA (Fabiola) 2 chapters - 324 utterances> 00:13:36.635#NI (Nicole Grassi) 2 chapters - 396 utterances> 00:19:26.222#SP (Simona Pagliari) 7 chapters - 557 utterances> 00:26:58.920#MM (Marzia Marianera) 2 chapters - 523 utterances> 00:28:18.365#ANGE (Angelina) 2 chapters - 246 utterances> 00:15:45.223#LAURA (Laura) 2 chapters - 899 utterances> 00:38:33.592#FG (Filippo Gioachin): 4 chapters - 1358 utterances> 00:58:59.194#GIOEMILY (?) 2 chapters - 743 utterances> 00:33:26.872#CAPI (Silvia di Simone) 1 chapters - 482 utterances> 00:29:16.553#ALLIE (Allie Cingi) 2 chapters - 838 utterances> 00:50:31.221#CAIMMA () 1 chapters - 328 utterances> 00:19:26.483#DOLCINEA () 1 chapters - 393 utterances> 00:19:19.275#MGT (Maria Grazia Tundo) 1 chapters - 290 utterances> 00:10:41.887#FR (Francesca Roma) 2 chapters - 246 utterances> 00:16:16.611#PETULA () 3 chapters - 251 utterances> 00:16:27.894
14 Males:#RF (Riccardo Fasol): 3 chapters - 961 utterances> 01:03:59.329#RC (Roberto Confini): 3 chapters - 1238 utterances> 00:42:2.761#SB (Sergio Baldelli): 2 chapters - 910 utterances> 00:54:57.145#DA (Daniele) 2 chapters - 420 utterances> 00:24:36.827#RECL (Renzo Clerico) 3 chapters - 778 utterances> 00:37:31.304#STRALF (?) 3 chapters - 1392 utterances> 00:51:23.847#PAOLO (?) 2 chapters - 484 utterances> 00:34:3.345#PIER (?) 1 chapters - 320 utterances> 00:14:10.860#AB (Andrea Briglia) - 31 chapters - 863 utterances> 00:40:26.526#SIRJOE (Sergio Bersanetti) 2 chapters - 626 utterances> 00:31:39.754#KIUKKO (Luigi Chiaro) 1 chapters - 368 utterances> 00:17:48.204#AZ (Francesco Montana) 1 chapters - 287 utterances> 00:14:39.814#BM (Beniamino Massimo) 1 chapters - 415 utterances> 00:16:6.842#ML (Mirko Lamberti) 1 chapters - 252 utterances> 00:12:27.594
and a dictionnary of 14969 words with aligned phones
Sources:
Audios are from Librivox
Aligned original texts are from diverses sources including Intratext, wikisource, pirandelloweb, etc
Each .wav file (sampled at 22050Hz) corresponds to one entire chapter. The format of the filenames is:{author's acronym}_{book's acronym}_{reader's acronym}_{volume's number}_{chapter's number}
The IT.csv file gives text (or sometimes, phones) and signal alignments for utterances in 4 fields separated by '|': {filename}|{start_ms}|{end_ms}|{text or phonetic content}. Most utterances are separated by at least a pause of 400ms (exceptionally less when phonation exceeds 11s). The intervals [start_ms:end_ms] comprise leading and trailing silences of 130ms (since wavs are entire chapters, these silences are "true" ambient silences).
When phonetic alignment has been performed, 1 additional field has been added: {aligned phones}. Each input character or phone has a corresponding aligned phone. Note that all aligned utterances start and end with an aligned silence of 130ms. The set of aligned phones comprises:
The set of input phones
The silence: '__'
The symbol '_' for silent characters, e.g. "occhi" is aligned with 'o^1 k: _ _ i'
Text is in UTF8. '«»','—', '~','""','()','[]' are respectively used for speaking quotes, turn switches, three dots, quoted expression, aside quotes, notes. Because of rare occurrences, 'ö' has been transcribed as 'oe'. Paragraphs (two consecutive carriage returns in the original text) are cued by a special character '§'. It usually ends an utterance but could be used within an utterance if its associated pause is too short.
Part of text under clear emphasis is surrounded by "#"
When available, phonetic content is given per word in curly brackets '{}'. We use 39 phonetic symbols:
oral vowels: a (fa), e (ve), e^ (ed), i (riz), u (tu), o (uno), o^ (con)
loan vowels & diphtongs: a&i and x^ (timer),
semi-vowels: h (ghetto.), w (quel), j (vaj)
consonants: p (vespa), t (tu), k (calde), b (buon), d (disse), g (grazie), f (fame), s (sauna) , s^ (scia), v (verde), z (rosa), z^ (judo), r (rizo), l (letto), l^ (egli), m (mapo), n (nuda), n~ (pugni)
long/double consonants are suffxed by ":", e.g. p: (zuppa)
primary stress if any is noted "1" and appended to the vowel, e.g. a1 g a p e (agape)
创建时间:
2025-02-20



