five

Ethnobotanical Research and Language Documentation of Nahuatl

收藏
DataCite Commons2021-07-19 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2021S06
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Ethnobotanical Research and Language Documentation of Nahuatl consists of approximately 190 hours of field recordings collected in the Sierra Nororiental and Sierra Norte regions of Puebla, Mexico. The corpus contains audio and video recordings of native Nahutal speakers during the collection of particular plants; partial transcripts (Nahuatl and Spanish); a Highland Puebla Nahuat dictionary; botanical and ethnobotanical data; and speaker metadata.</p><br> <p>Nahuatl is one of the most widely spoken indigenous langauges in the Americas with approximately 1.5 million speakers in Mexico. Many distinct and sometimes mutually intelligible varieties have been recognized. The recordings in this release were collected between 2008 and 2019 in two different municipalities: Cuetzalan del Progreso and Tepetzintla. Speech from Cuetzalan represents Highland Puebla Nahuat, and speech from Tepetzintla represents Zacatl&aacute;n-Ahuacatl&aacute;m-Tepetzintla Nahuatl.</p><br> <h3>Data</h3><br> <p>The recordings consist of a speaker talking about a plant's nomenclature, classification, and use. Audio files are primarily single channel 48kHz, 16-bit wav. Some data is also presented as mp3. Video files are presented as mp4.</p><br> <p>Transcripts are included for the Cuetzalan recordings in <a href="http://trans.sourceforge.net/en/presentation.php">Transcriber</a> format. These transcripts have been partially translated into Spanish using <a href="https://archive.mpi.nl/tla/elan">ELAN</a>.</p><br> <p>A Highland Puebla Nahuat dictionary is included in both text and Toolbox XML formats. Botanical and ethnobotanical information is presented as a collection of pdfs, and images as jpegs.</p><br> <p>Further information about the corpus is available in the included documentation.</p><br> <p>Note that some folders are empty and are planned to be used in future work.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2021S06.wav">Audio Sample (WAV)</a></li><br> <li><a href="desc/addenda/LDC2021S06.trs">Transcript Sample (TXT)</a></li><br> <li><a href="desc/addenda/LDC2021S06.spa-trans.eaf">Translation Sample (XML)</a></li><br> <li><a href="desc/addenda/LDC2021S06.dict.txt">Dictionary Sample (TXT)</a></li><br> <li><a href="desc/addenda/LDC2021S06.bot.jpg">Botanical Sample (JPG)</a></li><br> </ul><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2021 Jonathan D. Amith, © 2021 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2021-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作