Switchboard-2 Phase II
收藏Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC99S79
下载链接
链接失效反馈官方服务:
资源简介:
Introduction SWB-2 Phase II consists of 4,472 five-minute telephone conversations involving 679 participants. This corpus was collected by the Linguistic Data Consortium (LDC) in support of a project on Speaker Recognition sponsored by the U.S. Department of Defense. Data Participants in SWB-2 Phase II were recruited from the following midwestern college campuses: Iowa State University, Michigan State University, University of Michigan, University of Minnesota, University of Wisconsin at Madison, Northwestern University, and Ohio State University. Solicitation methods included the Internet, newspaper advertisements and personal contacts. The majority of the participants resided in Minnesota, Wisconsin, Ohio, Iowa, Michigan and Illinois as follows: Minnesota - 156 speakers Wisconsin -- 105 speakers Ohio -- 70 speakers Iowa 64 speakers Michigan -- 41 speakers Illinois - 37 speakers Each recruit was asked to participate in at least ten five-minute phone calls. Ideally each participant would receive five calls at a designated number and make five calls from phones with different (ANI) codes. Participants were asked to discuss a specific topic (read by the automated operator) and not to provide personal information during their call. Each of the 679 participants placed their calls via a toll-free robot operator maintained by LDC. Access to the robot operator was possible via a unique Personal Identification Number (PIN) issued by the recruiting staff at LDC when the caller enrolled in the project. Upon conclusion of the study all calls were audited by LDC staff members. Particular attention was paid to PIN verification (matching speaker with PIN), checking call duration, and call quality. Upon completion of this process, checks were issued and mailed to participants. The conversations have not been transcribed. Updates 09/29/2011: The file table and readme were updated to reflect that this data set was made available on DVD. Portions © 1999 Trustees of the University of Pennsylvania
## 简介
SWB-2 第二阶段(SWB-2 Phase II)数据集包含4472段时长为5分钟的电话对话,涉及679名参与者。该语料库由语言数据联盟(Linguistic Data Consortium, LDC)为支持美国国防部发起的说话人识别项目而采集构建。
## 数据概况
SWB-2 第二阶段的参与者招募自以下美国中西部大学校园:爱荷华州立大学、密歇根州立大学、密歇根大学、明尼苏达大学、威斯康星大学麦迪逊分校、西北大学以及俄亥俄州立大学。招募方式包括互联网、报纸广告及个人联络。绝大多数参与者居住于明尼苏达州、威斯康星州、俄亥俄州、爱荷华州、密歇根州与伊利诺伊州,具体分布如下:明尼苏达州156名说话人,威斯康星州105名,俄亥俄州70名,爱荷华州64名,密歇根州41名,伊利诺伊州37名。
每位招募参与者需完成至少10段5分钟的电话通话。理想情况下,每位参与者需接听5通拨打至指定号码的来电,并使用带有不同自动号码识别(Automatic Number Identification, ANI)码的设备拨打5通电话。参与者需按照自动语音播报员指定的话题进行对话,且通话过程中不得泄露个人信息。679名参与者均通过语言数据联盟维护的免费语音机器人呼叫系统发起通话。参与者可通过项目注册时由语言数据联盟招募团队发放的唯一个人识别码(Personal Identification Number, PIN)登录该系统。本研究结束后,所有通话均由语言数据联盟工作人员进行审核,审核重点包括PIN验证(匹配说话人与绑定的PIN)、通话时长核查及通话质量评估。完成上述审核流程后,项目方将向参与者发放并邮寄报酬。本数据集未附带对话转录文本。
## 更新记录
2011年9月29日:更新了文件列表与自述文件,以说明本数据集已通过DVD介质发布。
部分内容 © 1999 宾夕法尼亚大学董事会。
创建时间:
2024-01-31
搜集汇总
数据集介绍

背景与挑战
背景概述
Switchboard-2 Phase II是一个包含4,472段5分钟电话对话的英语语音数据集,涉及679名参与者,主要用于说话人识别研究。数据收集自美国中西部地区的大学校园,对话内容未进行文字转录。
以上内容由遇见数据集搜集并总结生成



