Ressources for End-to-End French Text-to-Speech Blizzard challenge
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/4580405
下载链接
链接失效反馈官方服务:
资源简介:
Here are 289 chapters of 5 audiobooks from Librivox (51:12) read by Nadine Eckert-Boulet (NEB):
Madame Bovary (MB) by Gustave Flaubert (FL) - 3 volumes, 35 chapters(original wavs; text)
Les mystères de Paris (LMP) by Eugene Sue (ES) - 4 volumes, 83 chapters (original wavs1, wavs2, wavs3; text1, text2, text3)
Les tribulations d'un chinois en Chine (TCC) by Jules Verne (JV) - 1 volume, 22 chapters (original wavs; text)
La fille du pirate (LFDP) by Henri Émile Chevalier (EC) - 7 volumes, 121 chapters (original wavs, text)
La vampire (VAMP) by Paul Féval (PF) - 1 volume, 28 chapters (original wavs, text)
and
2515 utterances (2:03) read by another female French speaker Aurélie Derbier (AD):
1608 utterances extracted from various books (DIVERS_BOOK_AD*)
907 transcripts of the sessions of the French parliament (DIVERS_PARL_01*)
We recently added three speakers from Librivox/Litteratureaudio:
Ezwa (EZWA): L'épouvante by Maurice Level (original wavs; text) - 11 chapters - 4869 utterances> 03:16
Pauline Latournerie (PL): Le pédagogue n'aime pas les enfants by Henri Roorda (original wavs; text) - 6 chapters - 1320 utterances> 01:17
Jean-Luc Fischer (JLF): L’Affaire Charles Dexter Ward by Howard Phillips Lovecraft (original wavs; text) - 16 chapters - 1823 utterances> 02:37
Each .wav file (sampled at 22050Hz) corresponds to one entire chapter. The format of the filenames is:{author's acronym}_{book's acronym}_{reader's acronym}_{volume's number}_{chapter's number}
The NEB_train.csv file gives text and phonetic alignments (essentially for MB and LMP) for utterances in 4 fields separated by '|':{filename}|{start_ms}|{end_ms}|{text or phonetic content}. Most utterances are separated by at least a pause of 400ms. The intervals [start_ms:end_ms] comprise leading and trailing silences of 130ms (since wavs are entire chapters, these silences are "true" ambient silences). Same for AD_train.csv.
When phonetic alignment has been performed, 2 additional fields have been added: {aligned phones}|{durations in ms}. Each input character or phone has a corresponding aligned phone and a duration. Note that all aligned utterances start and end with an aligned phone of 130ms. The set of aligned phones comprises:
The set of input phones
The silence: '__'
The symbol '_' for silent characters, e.g. "chat" is aligned with 's^ _ a _'
29 combined aligned phones ('a&i', 'a&j', 'b&q', 'd&q','d&z', 'd&z^', 'f&q', 'g&q', 'g&z', 'j&i', 'j&u', 'j&q', 'i&j', 'k&q', 'k&s', 'k&s&q', 'l&q', 'm&q', 'n&q', 'r&w', 'r&q', 's&q', 't&q', 't&s', 't&s^', 'w&a', 'z&q', 'p&q') that align to only one character, e.g. "expatrier" is aligned with 'e^ k&s p a t r i&j e _'
Text is in UTF8. '«»','¬', '~','""','()','[]' are respectively used for speaking quotes, turn switches, three dots, quoted expression, aside quotes, notes. Because of rare occurrences, 'ö' has been transcribed as 'oe'. Paragraphs (two consecutive carriage returns in the original text) are cued by a special character '§'. It usually ends an utterance but could be used within an utterance if its associated pause is too short.
When available, phonetic content is given per word in curly brackets '{}'. We use 39 phonetic symbols:
oral vowels: a (fa), e (fée), e^ (fait), x (feu), x^ (coeur), i (riz), y (fut), u (fou), o (faux), o^ (porc)
schwa: q (gage)
nasal vowels: a~ (rang), e~ (fin), x~ (un), o~ (rond)
semi-vowels: h (huit), w (ouate), j (hier)
consonants: p (pas), t (tas), k (cas), b (bas), d (dos), g (gars), f (faux), s (sot) , s^ (chat), v (vu), z (zut), z^ (jus), r (riz), l (la), m (ma), n (non), n~ (oignon), ng (camping)
创建时间:
2024-10-11



