TORGO Database of Dysarthric Articulation

Name: TORGO Database of Dysarthric Articulation
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:24:04
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2012S02

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3> TORGO Database of Dysarthric Articulation was developed by the University of Toronto's departments of <a href="http://web.cs.toronto.edu/Page4.aspx" rel="nofollow">Computer Science </a> and <a href="http://www.slp.utoronto.ca/Page13.aspx" rel="nofollow">Speech Language Pathology</a> in collaboration with the <a href="http://www.hollandbloorview.ca/" rel="nofollow">Holland-Bloorview Kids Rehabilitation Hospital</a> in Toronto, Canada. It contains approximately 23 hours of English speech data, accompanying transcripts and documentation from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric control group. CP and ALS are examples of dysarthria which is caused by disruptions in the neuro-motor interface that distort motor commands to the vocal articulators, resulting in atypical and relatively unintelligible speech in most cases. The TORGO database is primarily a resource for developing advanced automatic speaker recognition (ASR) models suited to the needs of people with dysarthria, but it is also applicable to non-dysarthric speech. The inability of modern ASR to effectively understand dysarthric speech is a problem since the more general physical disabilities often associated with the condition can make other forms of computer input, such as computer keyboards or touch screens, difficult to use. <h3>Data</h3> The data consists of aligned acoustics and measured 3D articulatory features from the speakers carried out using the <a href="http://www.articulograph.de/" rel="nofollow">3D AG500 electro-magnetic articulograph</a> (EMA) system (Carstens Medizinelektronik GmbH, Lenglern, Germany) with fully-automated calibration. This system allows for 3D recordings of articulatory movements inside and outside the vocal tract, thus providing a detailed window on the nature and direction of speech-related activity. The data was collected between 2008 and 2010 in Toronto, Canada. All subjects read text consisting of non-words, short words and restricted sentences from a 19-inch LCD screen. The restricted sentences included 162 sentences from the sentence intelligibility section of Assessment of intelligibility of dysarthric speech (Yorkston & Beukelman, 1981) and 460 sentences derived from the <a href="http://catalog.ldc.upenn.edu/LDC93S1" rel="nofollow">TIMIT</a> database. The unrestricted sentences were elicited by asking participants to spontaneously describe 30 images in interesting situations taken randomly from Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt students to tell or write a story. Data is organized by speaker and by the session in which each speaker recorded data. Each speaker was assigned a code and given their own file directory. The code for female speakers begins with F, and the code for male speakers begins with M. If the speaker was a member of the control group, the letter C follows the gender code. The last two digits of the code indicate the order in which that subject was recruited. For example, speaker FC02 was the second female speaker without dysarthria recruited. Note that some speakers were intentionally left out of the data, and thus, there are gaps in the numbering. Each speakers directory contains Session directories which encapsulate data recorded in the respective visit and occasionally, a Notes directory which can include Frenchay assessments (test for the measurement, description and diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other relevant notes. Each Session directory can, but does not necessarily, contain the following content: <ul> <li>alignment.txt: This is a text file containing the sample offsets between audio files recorded simultaneously by the array microphone and the head-worn microphone.</li> <li>amps: These directories contain raw *.amp and *.ini files produced by the AG500 articulograph.</li> <li>phn_*: These directories contain phonemic transcriptions of audio data. Each file is plain text with a *.PHN file extensions and a filename referring to the utterance number. These files were generated using the free <a href="http://www.speech.kth.se/wavesurfer/" rel="nofollow">Wavesurfer</a> tool.</li> <li>pos: These directories contain the head-corrected positions, velocities, and orientations of sensor coils for each utterance, as generated by the AG500 articulograph.</li> <li>prompts: These directories contain orthographic transcriptions.</li> <li>rawpos: These directories are equivalent to the pos directories except that their articulographic content is not head-normalized to a constant upright position.</li> <li>wav_*: These directories contain the acoustics. Each file is a RIFF (little-endian) WAVE audio file (Microsoft PCM, 16 bit, mono 16000 Hz).</li> <li>wavall: These directories contains a stereo recording in which one channel contains the recorded acoustics and the other channel contains the analog peaks associated with the sweep signal, which is used by the AG500 hardware for synchronization.</li> </ul> Additionally, sessions recorded with the AG500 articulograph are marked with the file EMA, and those recorded with the video-based system are marked with the file VIDEO. Files with a date form as the filename and a txt extension (e.g. april232008cal2.txt, jan28cal3.txt) are the measured responses from calibration. The *.log and *.calset files contain descriptions of the calibration process, but not the final result of calibration. See the <a href="docs/LDC2012S02/README.txt" rel="nofollow">readme</a> file and the <a href="http://wiki.ag500.net" rel="nofollow">AG500 Wiki</a> for more complete descriptions of the possible subfolders and of the AG500 specific files. Also, see <a href="docs/LDC2012S02/session_contents.tsv" rel="nofollow">session_contents.tsv</a> for a tab separated table of each sessions subfolders and metadata files. <h3>Samples</h3> For an example of the data contained in this corpus, review these two audio samples: <a href="desc/addenda/LDC2012S02_1.wav" rel="nofollow">Dysarthric</a> & <a href="desc/addenda/LDC2012S02_2.wav" rel="nofollow">Control</a>. <h3>Updates</h3> None at this time. Portions © 2008-2011 Frank Rudzicz, © 2012 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集