AnnoDIFP Session Audio and Transcripts
收藏DataCite Commons2025-07-10 更新2026-05-03 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2025S06
下载链接
链接失效反馈官方服务:
资源简介:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>AnnoDIFP Session Audio and Transcripts</title>
<meta http-equiv="Content-type" content="text/html/css;charset=UTF-8">
<meta name="description" content="Documentation for AnnoDIFP Session Audio and Transcripts">
<meta name="keywords" content="Linguistic Data Consortium">
<meta name="keywords" content="LDC">
<meta name="keywords" content="Documentation">
<meta name="keywords" content="AnnoDIFP Session Audio and Transcripts">
<style type="text/css">
body{
background-color: #ffffff;
color: #000000;
}
a:link{color : #990000;}
a:visited{color:#990000;}
a:active {color:#990000;}
h1{
text-align:center;
color:#990000;
}
h3.subtitle{
text-align:center;
color:#990000;
}
ul{
line-height:130%;
}
p.cited{
padding-left:2em;
text-indent:-2em;
}
p.footer{
font-size:0.85em;
}
table, th, td
{
border-collapse:collapse;
border: 1px solid black;
padding: 1px 5px;
text-align:center;
}
</style>
</head>
<body>
<h1>AnnoDIFP Session Audio and Transcripts</h1>
<h3 class="subtitle">LDC2025S06</h3>
<h3>Introduction</h3>
<p> AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts
was developed by the Linguistic Data Consortium (LDC), the <a href="https://www.fit.edu/">Florida Institute of
Technology </a> (FIT), and the <a href="https://www.newhaven.edu/index.php">University of New Haven</a> (UNH) to support algorithm development for predicting personality traits. It contains 438.34 hours of English audio and transcripts from in-person interviews of 366 participants paired with scores from two self-reported personality
assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and
Short Dark Triad (SD3).
</p>
<p>Survey and behavioral data were collected in three phases. Phase 1 consisted of online questionnaires. Selected participants were invited to participate in Phase 2a, collecting behavioral and linguistic data in a laboratory setting. In Phase 2b, participants engaged in a telephone speech collection by calling other particpants. This release covers the activities in Phase 2a. </p>
<h3>Data</h3>
<p>In-person interviews were recorded at LDC, FIT and UNH. In each session, the participant and interviewer sat in separate sound-isolated rooms with communication between them supplied by audio/video hardware. Sessions consisted of the following tasks: rapport building, a YouTube task, a map task, and a business task. Further details on collection methodology and session tasks are contained in the documentation accompanying this release.</p>
<p>There were a total of 386 participants in Phase 2a. This corpus contains audio data and transcripts from 301 participants and transcripts only for 65 participants. Recordings for 20 participants were not usable.
</p>
<p>Each session (or session part in the case of multipart sessions) is accompanied
by a transcript produced automatically using the <a href="https://www.rev.ai/">Rev.ai</a>
speech-to-text service.</p>
<p>
Speech data is presented as 16 kHz, 16-bit mono-channel FLAC-compressed MS-WAV files.
Text data is UTF-8 encoded.</p>
<h3>Updates</h3>
<p>
Additional information, updates, bug fixes may be available in the LDC
catalog entry for this corpus at <a
href="http://catalog.ldc.upenn.edu/LDC2025S06">LDC2025S06</a>.
</p>
<h3>Content Copyright</h3>
<p>Portions © 2025 Trustees of the University of Pennsylvania</p>
<hr>
<p class="footer">
Contact: <a href="mailto:ldc@ldc.upenn.edu">
<b>ldc@ldc.upenn.edu</b></a><br> © 2025 <A
HREF="http://www.ldc.upenn.edu">
<b>Linguistic Data Consortium</b></a>,
<a href="http://www.upenn.edu">
<b>Trustees of the University of Pennsylvania</b></a>. All Rights Reserved.
</p>
</body>
</html>
提供机构:
Linguistic Data Consortium
创建时间:
2025-07-10



