Robot reads ads: Data of the listening test and acoustic analysis

Figshare2022-11-03 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/data/21498255

下载链接

链接失效反馈

官方服务：

资源简介：

Likability scores on a 7-point Likert scale 1 = not likable at all … 7 = very likable Format of the audio file F1/F2/M1/M2_00/01/02/03_original/calm/energetic F1 and F2 - female voices M1 and M2 - male voices Texts in Estonian for synthesis 00 - Olen kõnerobot (nimi) ja õpin reklaame lugema. [I am speech robot (name) and I am learning to read advertisements]. 01 - Hullud päevad kolmapäevast pühapäevani Vesiku kaubakeskuses. [Crazy days Wednesday to Sunday at Vesiku shopping center.] 02 - Sinu tegemiste õnnestumised saavad alguse heast ideest. Laenumarket – kõik tarbimislaenud ühest kohast. [The success of your endeavors starts from a good idea. Loan market – all consumer loans from one source.] 03 - Tule Diili ja vaheta vana uue vastu! [Come to Deal and swap the old one for a new one!] Transferred styles calm energetic List of eGeMAPS parameter abbreviations A1,2,3 = difference of log amplitude of first, second and third harmonic to f0 amplitude AF1,F2,F3 = difference of log amplitude of first, second and third formant to f0 amplitude alpha ratio = ratio of the summed energy from 50–1000 Hz and 1–5 kHz BF1,F2,F3 = bandwidth of first, second and third formant f0 = logarithmic fundamental frequency on a semitone frequency scale, starting at 27.5 Hz (semitone 0) F1,2,3 = centre frequency of first, second and third formant Hammarberg index = ratio of the strongest energy peak in the 0–2 kHz region to the strongest peak in the 2–5 kHz region harmonic difference H1–H2 = ratio of energy of the first f0 harmonic (H1) to the energy of the second fo harmonic (H2) harmonic difference H1–A3 = ratio of energy of the first f0 harmonic (H1) to the energy of the highest harmonic in the third formant range (A3) HNR = harmonics-to-noise ratio jitter = deviations in individual consecutive f0 period lengths LEq = equivalent sound level, computed by converting the average of the per-frame RMS energies to a logarithmic (dB) scale MFCC1,2,3,4 = first, second, third and fourth Mel-frequency cepstral coefficient loudness = estimate of perceived signal intensity from an auditory spectrum pctl = percentile pctlrg = range of the 20th to 80th percentile SDnorm = standard deviation normalised by the arithmetic mean (coefficient of variation) shimmer = difference of the peak amplitudes of consecutive f0 periods spectral flux = difference of the spectra of two consecutive frames spectral slope 0–500 Hz or 500–1500 Hz = linear regression slope of the logarithmic power spectrum for 0–500 Hz or 500–1500 Hz region VR = voiced regions UVR = unvoiced regions Eyben, F., Scherer, K., Schuller, B., Sundberg, J., Andre, E., Busso, C., Devillers, L., Epps, J., 310 Laukka, P., Narayanan, S., and Truong, K. (2016). The Geneva minimalistic acoustic parameter set 311 (GeMAPS) for voice research and affective computing. IEEE T. Affect. Comput 7 (2), 190-202. doi: 312 10.1109/TAFFC.2015.2457417

创建时间：

2022-11-03