five

AnnoDIFP Session Audio and Transcripts

收藏
DataCite Commons2025-07-10 更新2026-05-03 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2025S06
下载链接
链接失效反馈
官方服务:
资源简介:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>AnnoDIFP Session Audio and Transcripts</title> <meta http-equiv="Content-type" content="text/html/css;charset=UTF-8"> <meta name="description" content="Documentation for AnnoDIFP Session Audio and Transcripts"> <meta name="keywords" content="Linguistic Data Consortium"> <meta name="keywords" content="LDC"> <meta name="keywords" content="Documentation"> <meta name="keywords" content="AnnoDIFP Session Audio and Transcripts"> <style type="text/css"> body{ background-color: #ffffff; color: #000000; } a:link{color : #990000;} a:visited{color:#990000;} a:active {color:#990000;} h1{ text-align:center; color:#990000; } h3.subtitle{ text-align:center; color:#990000; } ul{ line-height:130%; } p.cited{ padding-left:2em; text-indent:-2em; } p.footer{ font-size:0.85em; } table, th, td { border-collapse:collapse; border: 1px solid black; padding: 1px 5px; text-align:center; } </style> </head> <body> <h1>AnnoDIFP Session Audio and Transcripts</h1> <h3 class="subtitle">LDC2025S06</h3> <h3>Introduction</h3> <p> AnnoDIFP (Annotated Data for the Investigation of Facets of Personality) Session Audio and Transcripts was developed by the Linguistic Data Consortium (LDC), the <a href="https://www.fit.edu/">Florida Institute of Technology </a> (FIT), and the <a href="https://www.newhaven.edu/index.php">University of New Haven</a> (UNH) to support algorithm development for predicting personality traits. It contains 438.34 hours of English audio and transcripts from in-person interviews of 366 participants paired with scores from two self-reported personality assessments, HEXACO Personality Inventory (Revised) (HEXACO-PI-R) and Short Dark Triad (SD3). </p> <p>Survey and behavioral data were collected in three phases. Phase 1 consisted of online questionnaires. Selected participants were invited to participate in Phase 2a, collecting behavioral and linguistic data in a laboratory setting. In Phase 2b, participants engaged in a telephone speech collection by calling other particpants. This release covers the activities in Phase 2a. </p> <h3>Data</h3> <p>In-person interviews were recorded at LDC, FIT and UNH. In each session, the participant and interviewer sat in separate sound-isolated rooms with communication between them supplied by audio/video hardware. Sessions consisted of the following tasks: rapport building, a YouTube task, a map task, and a business task. Further details on collection methodology and session tasks are contained in the documentation accompanying this release.</p> <p>There were a total of 386 participants in Phase 2a. This corpus contains audio data and transcripts from 301 participants and transcripts only for 65 participants. Recordings for 20 participants were not usable. </p> <p>Each session (or session part in the case of multipart sessions) is accompanied by a transcript produced automatically using the <a href="https://www.rev.ai/">Rev.ai</a> speech-to-text service.</p> <p> Speech data is presented as 16 kHz, 16-bit mono-channel FLAC-compressed MS-WAV files. Text data is UTF-8 encoded.</p> <h3>Updates</h3> <p> Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at <a href="http://catalog.ldc.upenn.edu/LDC2025S06">LDC2025S06</a>. </p> <h3>Content Copyright</h3> <p>Portions © 2025 Trustees of the University of Pennsylvania</p> <hr> <p class="footer"> Contact: <a href="mailto:ldc@ldc.upenn.edu"> <b>ldc@ldc.upenn.edu</b></a><br> &copy; 2025 <A HREF="http://www.ldc.upenn.edu"> <b>Linguistic Data Consortium</b></a>, <a href="http://www.upenn.edu"> <b>Trustees of the University of Pennsylvania</b></a>. All Rights Reserved. </p> </body> </html>
提供机构:
Linguistic Data Consortium
创建时间:
2025-07-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作