AnnoDIFP Session Audio and Transcripts

Name: AnnoDIFP Session Audio and Transcripts
Creator: Linguistic Data Consortium
Published: 2025-07-10 16:03:29
License: 暂无描述

DataCite Commons2025-07-10 更新2026-05-03 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2025S06

下载链接

链接失效反馈

官方服务：

资源简介：

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>AnnoDIFP Session Audio and Transcripts</title> <meta http-equiv="Content-type" content="text/html/css;charset=UTF-8"> <meta name="description" content="Documentation for AnnoDIFP Session Audio and Transcripts"> <meta name="keywords" content="Linguistic Data Consortium"> <meta name="keywords" content="LDC"> <meta name="keywords" content="Documentation"> <meta name="keywords" content="AnnoDIFP Session Audio and Transcripts"> <style type="text/css"> body{ background-color: #ffffff; color: #000000; } a:link{color : #990000;} a:visited{color:#990000;} a:active {color:#990000;} h1{ text-align:center; color:#990000; } h3.subtitle{ text-align:center; color:#990000; } ul{ line-height:130%; } p.cited{ padding-left:2em; text-indent:-2em; } p.footer{ font-size:0.85em; } table, th, td { border-collapse:collapse; border: 1px solid black; padding: 1px 5px; text-align:center; } </style> </head> <body> <h1>AnnoDIFP Session Audio and Transcripts</h1> <h3 class="subtitle">LDC2025S06</h3> <h3>Introduction</h3> <p> AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the <a href="https://www.fit.edu/">Florida Institute of Technology </a> (FIT), and the <a href="https://www.newhaven.edu/index.php">University of New Haven</a> (UNH) to support algorithm development for predicting personality traits. It contains 438.34 hours of English audio and transcripts from in-person interviews of 366 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3). </p> <p>Survey and behavioral data were collected in three phases. Phase 1 consisted of online questionnaires. Selected participants were invited to participate in Phase 2a, collecting behavioral and linguistic data in a laboratory setting. In Phase 2b, participants engaged in a telephone speech collection by calling other particpants. This release covers the activities in Phase 2a. </p> <h3>Data</h3> <p>In-person interviews were recorded at LDC, FIT and UNH. In each session, the participant and interviewer sat in separate sound-isolated rooms with communication between them supplied by audio/video hardware. Sessions consisted of the following tasks: rapport building, a YouTube task, a map task, and a business task. Further details on collection methodology and session tasks are contained in the documentation accompanying this release.</p> <p>There were a total of 386 participants in Phase 2a. This corpus contains audio data and transcripts from 301 participants and transcripts only for 65 participants. Recordings for 20 participants were not usable. </p> <p>Each session (or session part in the case of multipart sessions) is accompanied by a transcript produced automatically using the <a href="https://www.rev.ai/">Rev.ai</a> speech-to-text service.</p> <p> Speech data is presented as 16 kHz, 16-bit mono-channel FLAC-compressed MS-WAV files. Text data is UTF-8 encoded.</p> <h3>Updates</h3> <p> Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at <a href="http://catalog.ldc.upenn.edu/LDC2025S06">LDC2025S06</a>. </p> <h3>Content Copyright</h3> <p>Portions © 2025 Trustees of the University of Pennsylvania</p> <hr> <p class="footer"> Contact: <a href="mailto:ldc@ldc.upenn.edu"> <b>ldc@ldc.upenn.edu</b></a><br> © 2025 <A HREF="http://www.ldc.upenn.edu"> <b>Linguistic Data Consortium</b></a>, <a href="http://www.upenn.edu"> <b>Trustees of the University of Pennsylvania</b></a>. All Rights Reserved. </p> </body> </html>

提供机构：

Linguistic Data Consortium

创建时间：

2025-07-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集