five

Dialogs Re-Enacted Across Languages

收藏
DataCite Commons2025-06-03 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2024S08
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>Dialogs Re-Enacted Across Languages was developed at the <a href="https://www.utep.edu/">University of Texas at El Paso</a>. It contains approximately 17 hours of conversational speech in English and Spanish by 129 unique bilingual speakers, specifically, short fragments extracted from spontaneous conversations and close re-enactments in the other language by the original speakers, for 3816 pairs of matching utterances.</p> <h3>Data</h3> <p>Data was collected in 2022-2023. Participants were recruited from among students at the University of Texas at El Paso which is located on the US-Mexico border. All participants were bilingual speakers of General American English and of Mexico-Texas Border Spanish. Their self-described dialects for English were El Paso and for Spanish, mostly "El Paso/Juarez."</p> <p>Each speaker pair had a ten minute conversation in one language. From these conversations, various fragments of the conversations were chosen for re-enactment, and the original speakers produced equivalents in the other language. Each re-enactment was vetted for fidelity to the original and naturalness in the target language.</p> <p>After recording, fragments were mapped to the translated re-enactments using&nbsp;<a href="https://archive.mpi.nl/tla/elan">ELAN</a>, an annotation tool for audio and video recordings.</p> <p>Metadata about conversations, participants, re-enactments and utterances are included in this release.</p> <p>The audio data is presented as flac compressed, single channel, 16 kHz, 16-bit linear PCM.</p>
提供机构:
Linguistic Data Consortium
创建时间:
2024-07-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作