five

AnnoDIFP CTS Audio and Transcripts

收藏
DataCite Commons2025-11-07 更新2026-05-03 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2025S10
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) CTS (Conversational Telephone Speech) Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the <a href="https://www.fit.edu/">Florida Institute of Technology </a> (FIT), and the <a href="https://www.newhaven.edu/index.php">University of New Haven</a> (UNH) to support algorithm development for predicting personality traits. It contains 242.52 hours of English audio and transcripts from 1,179 calls involving 327 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3).</p><br> <p>Survey and behavioral data were collected in three phases. Phase 1 consisted of online questionnaires. Selected participants were invited to participate in Phase 2a, collecting behavioral and linguistic data in a laboratory setting. In Phase 2b, participants engaged in a telephone speech collection. This release covers the activities in Phase 2b. The data collected in Phase 2a is contained in <a href="../../../LDC2025S06">AnnoDIFP Session Audio and Transcripts (LDC2025S06)</a>.</p><br> <h3>Data</h3><br> <p>Telephone calls were collected using LDC's robot-operator <a href="https://www.ldc.upenn.edu/about/facilities/human-subjects-collection">platform</a>. The operator called participants every 24 hours during their indicated availability and paired them with another participant to speak on a prompted topic for 10 minutes. Further details on collection methodology are contained in the documentation accompanying this release.</p><br> <p>There were a total of 327 participants in Phase 2a. This corpus contains audio and transcripts for 277 paticipants and transcripts only for 65 participants.</p><br> <p>Speech data is presented as 16 kHz, 16-bit mono-channel FLAC-compressed MS-WAV files.</p><br> <p>Transcripts were produced automatically using the <a href="https://www.rev.ai/">Rev.ai</a> speech-to-text service. Text data is UTF-8 encoded.</p><br> <h3>Updates</h3><br> <p>No updates at this time.</p></br> Portions © 2025 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2025-11-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作