five

MATERIAL Swahili-English Language Pack

收藏
DataCite Commons2026-01-09 更新2026-05-03 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2026S01
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>MATERIAL Swahili-English Language Pack, Linguistic Data Consortium (LDC) Catalog Number LDC2026S01 was developed by <a href="http://www.appen.com/">Appen</a> for the IARPA (Intelligence Advanced Research Projects Activity) <a href="https://www.iarpa.gov/index.php/research-programs/material">MATERIAL</a> (Machine Translation for English Retrieval of Information in Any Language) program. It contains approximately 112 hours of Swahili conversational telephone speech, transcripts, English translations, annotations and queries.</p> <p>The MATERIAL program focused on underserved languages with the ultimate goal to build cross language information retrieval systems to find speech and text content using English search queries.</p> <h3>Data</h3> <p>The Swahili speech in this release represents that spoken in the Nairobi dialect region of Kenya. The gender distribution among speakers is approximately equal; speakers' ages range from 16 years to 69 years. Calls were made using different telephones (e.g., mobile, landline) from a variety of environments including the street, a home or office, a public place, and inside a vehicle.</p> <p>Transcripts cover approximately 30% of the speech data, and approximately 3% of the speech data was translated into English. Further information about transcription and translation methodologies is contained in the documentation accompanying this release.</p> <p>Swahili-English Language Pack also includes domain annotations, English queries and their relevance annotations.</p> <p>Speech data is presented either as two channel wav or single channel sphere files, predominately in 8kHz A-law format. Some files 48kHz and single channel. All text data is UTF-8 encoded.</p> <h3>Updates</h3> <p>No updates at this time.</p>
提供机构:
Linguistic Data Consortium
创建时间:
2026-01-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作