AnnoDIFP CTS Audio and Transcripts

Name: AnnoDIFP CTS Audio and Transcripts
Creator: Linguistic Data Consortium
Published: 2025-11-07 20:51:25
License: 暂无描述

DataCite Commons2025-11-07 更新2026-05-03 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2025S10

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) CTS (Conversational Telephone Speech) Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the <a href="https://www.fit.edu/">Florida Institute of Technology </a> (FIT), and the <a href="https://www.newhaven.edu/index.php">University of New Haven</a> (UNH) to support algorithm development for predicting personality traits. It contains 242.52 hours of English audio and transcripts from 1,179 calls involving 327 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3).</p><br> <p>Survey and behavioral data were collected in three phases. Phase 1 consisted of online questionnaires. Selected participants were invited to participate in Phase 2a, collecting behavioral and linguistic data in a laboratory setting. In Phase 2b, participants engaged in a telephone speech collection. This release covers the activities in Phase 2b. The data collected in Phase 2a is contained in <a href="../../../LDC2025S06">AnnoDIFP Session Audio and Transcripts (LDC2025S06)</a>.</p><br> <h3>Data</h3><br> <p>Telephone calls were collected using LDC's robot-operator <a href="https://www.ldc.upenn.edu/about/facilities/human-subjects-collection">platform</a>. The operator called participants every 24 hours during their indicated availability and paired them with another participant to speak on a prompted topic for 10 minutes. Further details on collection methodology are contained in the documentation accompanying this release.</p><br> <p>There were a total of 327 participants in Phase 2a. This corpus contains audio and transcripts for 277 paticipants and transcripts only for 65 participants.</p><br> <p>Speech data is presented as 16 kHz, 16-bit mono-channel FLAC-compressed MS-WAV files.</p><br> <p>Transcripts were produced automatically using the <a href="https://www.rev.ai/">Rev.ai</a> speech-to-text service. Text data is UTF-8 encoded.</p><br> <h3>Updates</h3><br> <p>No updates at this time.</p></br> Portions © 2025 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2025-11-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集