AnnoDIFP CTS Audio and Transcripts
收藏DataCite Commons2025-11-07 更新2026-05-03 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2025S10
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) CTS (Conversational Telephone Speech) Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the <a href="https://www.fit.edu/">Florida Institute of Technology </a> (FIT), and the <a href="https://www.newhaven.edu/index.php">University of New Haven</a> (UNH) to support algorithm development for predicting personality traits. It contains 242.52 hours of English audio and transcripts from 1,179 calls involving 327 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3).</p><br>
<p>Survey and behavioral data were collected in three phases. Phase 1 consisted of online questionnaires. Selected participants were invited to participate in Phase 2a, collecting behavioral and linguistic data in a laboratory setting. In Phase 2b, participants engaged in a telephone speech collection. This release covers the activities in Phase 2b. The data collected in Phase 2a is contained in <a href="../../../LDC2025S06">AnnoDIFP Session Audio and Transcripts (LDC2025S06)</a>.</p><br>
<h3>Data</h3><br>
<p>Telephone calls were collected using LDC's robot-operator <a href="https://www.ldc.upenn.edu/about/facilities/human-subjects-collection">platform</a>. The operator called participants every 24 hours during their indicated availability and paired them with another participant to speak on a prompted topic for 10 minutes. Further details on collection methodology are contained in the documentation accompanying this release.</p><br>
<p>There were a total of 327 participants in Phase 2a. This corpus contains audio and transcripts for 277 paticipants and transcripts only for 65 participants.</p><br>
<p>Speech data is presented as 16 kHz, 16-bit mono-channel FLAC-compressed MS-WAV files.</p><br>
<p>Transcripts were produced automatically using the <a href="https://www.rev.ai/">Rev.ai</a> speech-to-text service. Text data is UTF-8 encoded.</p><br>
<h3>Updates</h3><br>
<p>No updates at this time.</p></br>
Portions © 2025 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2025-11-07



