TORGO Database of Dysarthric Articulation
收藏Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2012S02
下载链接
链接失效反馈官方服务:
资源简介:
Introduction TORGO Database of Dysarthric Articulation was developed by the University of Toronto's departments of Computer Science and Speech Language Pathology in collaboration with the Holland-Bloorview Kids Rehabilitation Hospital in Toronto, Canada. It contains approximately 23 hours of English speech data, accompanying transcripts and documentation from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric control group. CP and ALS are examples of dysarthria which is caused by disruptions in the neuro-motor interface that distort motor commands to the vocal articulators, resulting in atypical and relatively unintelligible speech in most cases. The TORGO database is primarily a resource for developing advanced automatic speaker recognition (ASR) models suited to the needs of people with dysarthria, but it is also applicable to non-dysarthric speech. The inability of modern ASR to effectively understand dysarthric speech is a problem since the more general physical disabilities often associated with the condition can make other forms of computer input, such as computer keyboards or touch screens, difficult to use. Data The data consists of aligned acoustics and measured 3D articulatory features from the speakers carried out using the 3D AG500 electro-magnetic articulograph (EMA) system (Carstens Medizinelektronik GmbH, Lenglern, Germany) with fully-automated calibration. This system allows for 3D recordings of articulatory movements inside and outside the vocal tract, thus providing a detailed window on the nature and direction of speech-related activity. The data was collected between 2008 and 2010 in Toronto, Canada. All subjects read text consisting of non-words, short words and restricted sentences from a 19-inch LCD screen. The restricted sentences included 162 sentences from the sentence intelligibility section of Assessment of intelligibility of dysarthric speech (Yorkston & Beukelman, 1981) and 460 sentences derived from the TIMIT database. The unrestricted sentences were elicited by asking participants to spontaneously describe 30 images in interesting situations taken randomly from Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt students to tell or write a story. Data is organized by speaker and by the session in which each speaker recorded data. Each speaker was assigned a code and given their own file directory. The code for female speakers begins with F, and the code for male speakers begins with M. If the speaker was a member of the control group, the letter C follows the gender code. The last two digits of the code indicate the order in which that subject was recruited. For example, speaker FC02 was the second female speaker without dysarthria recruited. Note that some speakers were intentionally left out of the data, and thus, there are gaps in the numbering. Each speakers directory contains Session directories which encapsulate data recorded in the respective visit and occasionally, a Notes directory which can include Frenchay assessments (test for the measurement, description and diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other relevant notes. Each Session directory can, but does not necessarily, contain the following content: alignment.txt: This is a text file containing the sample offsets between audio files recorded simultaneously by the array microphone and the head-worn microphone. amps: These directories contain raw *.amp and *.ini files produced by the AG500 articulograph. phn_*: These directories contain phonemic transcriptions of audio data. Each file is plain text with a *.PHN file extensions and a filename referring to the utterance number. These files were generated using the free Wavesurfer tool. pos: These directories contain the head-corrected positions, velocities, and orientations of sensor coils for each utterance, as generated by the AG500 articulograph. prompts: These directories contain orthographic transcriptions. rawpos: These directories are equivalent to the pos directories except that their articulographic content is not head-normalized to a constant upright position. wav_*: These directories contain the acoustics. Each file is a RIFF (little-endian) WAVE audio file (Microsoft PCM, 16 bit, mono 16000 Hz). wavall: These directories contains a stereo recording in which one channel contains the recorded acoustics and the other channel contains the analog peaks associated with the sweep signal, which is used by the AG500 hardware for synchronization. Additionally, sessions recorded with the AG500 articulograph are marked with the file EMA, and those recorded with the video-based system are marked with the file VIDEO. Files with a date form as the filename and a txt extension (e.g. april232008cal2.txt, jan28cal3.txt) are the measured responses from calibration. The *.log and *.calset files contain descriptions of the calibration process, but not the final result of calibration. See the readme file and the AG500 Wiki for more complete descriptions of the possible subfolders and of the AG500 specific files. Also, see session_contents.tsv for a tab separated table of each sessions subfolders and metadata files. Samples For an example of the data contained in this corpus, review these two audio samples: Dysarthric & Control. Updates None at this time. Portions © 2008-2011 Frank Rudzicz, © 2012 Trustees of the University of Pennsylvania
TORGO构音障碍语音数据库由加拿大多伦多大学计算机科学系与言语语言病理学系,联合加拿大多伦多的荷兰-布卢维尤儿童康复医院共同开发。该数据集包含约23小时的英语语音数据,附带转录文本与文档,涉及8位构音障碍受试者(5男3女,均患有脑瘫(cerebral palsy, CP)或肌萎缩侧索硬化症(amyotrophic lateral sclerosis, ALS)),以及7位非构音障碍对照组受试者(4男3女)。CP与ALS是引发构音障碍(dysarthria)的典型病症,其病理机制为神经肌肉接口受损,导致传递至发音构音器官的运动指令出现异常,多数情况下会造成语音不典型且辨识度较低。TORGO数据库主要用于开发适配构音障碍人群需求的先进自动说话人识别(ASR, Automatic Speaker Recognition)模型,同时也可适用于非构音障碍语音的相关研究。当前主流自动说话人识别技术无法有效理解构音障碍语音,这一问题亟待解决——因为这类病症常伴随较为严重的身体残疾,使得键盘、触摸屏等其他计算机输入方式难以被受试者使用。
## 数据说明
本数据集包含经对齐的声学数据,以及通过3D AG500电磁构音描记仪(EMA, electro-magnetic articulograph,德国伦格恩Carstens Medizinelektronik GmbH公司出品)配合全自动校准流程采集的3D构音特征测量数据。该系统可对声道内外的构音运动进行三维记录,从而细致呈现语音相关活动的特征与运动方向。数据采集于2008年至2010年间的加拿大多伦多。
所有受试者均从19英寸LCD屏幕读取预设文本,内容包括无意义词、短词及限定语句。其中限定语句包含162句来自《构音障碍语音可懂度评估》(Yorkston & Beukelman, 1981)的可懂度测试语句,以及460句源自TIMIT数据库的语句。自发语句则通过要求受试者即兴描述30张取自《Webber图片卡-故事开篇》(Webber, 2005)的随机场景图片获取,该素材旨在引导学生进行故事讲述或创作。
数据按受试者及每次录制的会话进行组织。每位受试者均分配有专属编码,并拥有独立的文件目录:女性受试者的编码以字母F开头,男性受试者则以M开头;若受试者属于对照组,则在性别编码后追加字母C;编码的最后两位数字代表该受试者的招募顺序。例如,编码FC02代表第二位招募的非构音障碍女性受试者。请注意,部分受试者的编码未被纳入数据集,因此编号存在空缺。
每位受试者的目录包含会话目录(封装对应访次录制的数据),以及可选的Notes目录,其中可包含Frenchay构音障碍评估量表(用于构音障碍的测量、描述与诊断)、会话相关备注(如传感器故障)及其他相关笔记。
每个会话目录可能包含以下内容(并非所有会话都包含全部子项):
- alignment.txt:该文本文件记录了阵列麦克风与头戴式麦克风同时录制的音频文件之间的采样偏移量。
- amps:此类目录包含AG500构音描记仪生成的原始*.amp与*.ini文件。
- phn_*:此类目录包含音频数据的音素转录文本。每个文件均为纯文本格式,扩展名为*.PHN,文件名对应发声编号。此类文件通过开源工具Wavesurfer生成。
- pos:此类目录包含经头部校正后的各发声语句的传感器线圈位置、速度与姿态数据,由AG500构音描记仪生成。
- prompts:此类目录包含正字法转录文本。
- rawpos:此类目录与pos目录功能一致,但其中的构音描记数据未经过头部归一化处理至标准直立姿态。
- wav_*:此类目录包含声学数据。每个文件均为资源交换文件格式(RIFF, Resource Interchange File Format)小端序WAVE音频文件(遵循微软PCM格式,16位、单声道、采样率16000 Hz)。
- wavall:此类目录包含立体声录音,其中一个声道为录制的声学信号,另一声道为与扫描信号相关的模拟峰值,AG500硬件使用该信号进行同步。
此外,使用AG500构音描记仪录制的会话会以文件EMA作为标记,使用基于视频的系统录制的会话则以文件VIDEO作为标记。以日期作为文件名且扩展名为txt的文件(例如april232008cal2.txt、jan28cal3.txt)为校准过程中测得的响应数据。*.log与*.calset文件包含校准流程的描述,但不包含校准最终结果。如需了解更多子文件夹及AG500专属文件的完整说明,请参阅readme文件与AG500 Wiki。另外,可查看session_contents.tsv文件,该文件为以制表符分隔的表格,记录了每个会话的子文件夹与元数据文件信息。
## 数据示例
如需查看该语料库中的数据示例,请查阅以下两段音频样本:构音障碍组与对照组。
## 更新说明
暂无更新。
版权声明:部分内容© 2008-2011 Frank Rudzicz,© 2012 宾夕法尼亚大学托管委员会。
创建时间:
2024-01-31
搜集汇总
数据集介绍

背景与挑战
背景概述
TORGO Database of Dysarthric Articulation是一个包含构音障碍和非构音障碍说话者语音数据的资源库,旨在支持自动说话人识别模型的开发。数据集包含约23小时的英语语音数据,使用3D AG500电磁发音仪系统记录,数据按说话者和会话组织,包含音频文件、转录文本和发音测量数据。
以上内容由遇见数据集搜集并总结生成



