CSC Deceptive Speech

Name: CSC Deceptive Speech
Creator: Linguistic Data Consortium
Published: 2023-01-11 08:55:39
License: 暂无描述

DataCite Commons2023-01-11 更新2024-07-13 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2013S09

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>CSC Deceptive Speech was developed by Columbia University, SRI International and University of Colorado Boulder. It consists of 32 hours of audio interviews from 32 native speakers of Standard American English (16 male,16 female) recruited from the Columbia University student population and the community. The purpose of the study was to distinguish deceptive speech from non-deceptive speech using machine learning techniques on extracted features from the corpus.</p><br> <p>The participants were told that they were participating in a communication experiment which sought to identify people who fit the profile of the top entrepreneurs in America. To this end, the participants performed tasks and answered questions in six areas. They were later told that they had received low scores in some of those areas and did not fit the profile. The subjects then participated in an interview where they were told to convince the interviewer that they had actually achieved high scores in all areas and that they did indeed fit the profile. The task of the interviewer was to determine how he thought the subjects had actually performed, and he was allowed to ask them any questions other than those that were part of the performed tasks. For each question from the interviewer, subjects were asked to indicate whether the reply was true or contained any false information by pressing one of two pedals hidden from the interviewer under a table.</p><br> <h3>Data</h3><br> <p>Interviews were conducted in a double-walled sound booth and recorded to digital audio tape on two channels using Crown CM311A Differoid headworn close-talking microphones, then downsampled to 16kHz before processing.</p><br> <p>The interviews were orthographically transcribed by hand using the NIST EARS transcription guidelines. Labels for local lies were obtained automatically from the pedal-press data and hand-corrected for alignment, and labels for global lies were annotated during transcription based on the known scores of the subjects versus their reported scores. The orthographic transcription was force-aligned using the SRI telephone speech recognizer adapted for full-bandwidth recordings. There are several segmentations associated with the corpus: the implicit segmentation of the pedal presses, derived semi-automatically sentence-like units (EARS SLASH-UNITS or SUs) which were hand labeled, intonational phrase units and the units corresponding to each topic of the interview.</p><br> <p>Transcript files are in .trs format and audio files are .wav presented in <a href="https://xiph.org/flac/" rel="nofollow">flac-compressed</a> form for this release.</p><br> <h3>Samples</h3><br> <p>Please view these <a href="desc/addenda/LDC2013S09.wav" rel="nofollow">audio</a> and <a href="desc/addenda/LDC2013S09.txt" rel="nofollow">transcript</a> samples for the interviewer side of a conversation..</p><br> <h3>Updates</h3><br> <p>On May 22, 2014 an additional documentation file was added to explain the questions  participants were asked.</p></br> Portions © 2013 The Trustees of Columbia University, Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

搜集汇总

数据集介绍

背景与挑战

背景概述

CSC Deceptive Speech是一个英语语音数据集，包含32小时的音频访谈，来自32名标准美式英语母语者，旨在研究欺骗性语音的识别。数据通过实验设计收集，参与者在访谈中标记回答的真实性，音频以16kHz采样率录制，并附有手动转录和自动标注的欺骗标签。该数据集适用于语音识别和异常分析等机器学习应用。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集