five

INEL Nenets Corpus

收藏
DataCite Commons2024-12-19 更新2025-04-16 收录
下载链接:
https://www.fdr.uni-hamburg.de/record/16518
下载链接
链接失效反馈
官方服务:
资源简介:
<strong>Corpus Citation</strong> <em>Budzisch, Josefina; Wagner-Nagy, Beáta. 2024. INEL Nenets Corpus. Version 1.0. Publication date 2024-12-31. </em>https://hdl.handle.net/11022/0000-0007-FE37-E<em>. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. </em>https://hdl.handle.net/11022/0000-0007-F45A-1 <strong>Corpus Description</strong> The INEL Nenets corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus includes texts recorded between 1940–2011 in both Nenets lects – Forest Nenets and Tundra Nenets. The majority of texts in this corpus originate from published works, which are appropriately cited in the relevant sections of the metadata. In particular, the following publications were used, the full information can be found in the reference section of the documentation: Barmich 2018 Burkova 2008 Burkova 2012 Burkova et al. 2003 Hajdú 1968 Koshkareva et al. 2007 Labanauskas 2001 Logany &amp; Logany 2016 Lyubinskaya 2022 Pusztay 1976 Tereshchenko 1956 Tereshchenko 1990 Turutina 2003 Yangasova 2018 Svetlana Burkova kindly shared a collection of her Forest Nenets data including an original sound recording (Agan dialect), transcripts and glosses as Toolbox files and Word documents (Agan and Pur dialects), as well as published texts in Pur (Turutina 2003) and Numto (Logany &amp; Logany 2016) dialects. All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English, German and Russian. Audio recording is also provided for one text. <strong>Corpus size</strong> Forest Nenets: <strong>80</strong> texts, 3,709 sentences, <strong>23,597</strong> tokens Tundra Nenets: <strong>56</strong> texts, <strong>6,545</strong> sentences, <strong>37,681</strong> tokens Total: <strong>136</strong> texts, <strong>10,254</strong> sentences, <strong>61,278</strong> tokens Total duration of audio: <strong>44</strong> minutes <strong>45</strong> seconds <strong>Funding</strong> The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Searching the corpus The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/NenetsCorpus/search. Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php). See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Nenets Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/nenets/.

**语料库引用信息** 布齐施(Josefina Budzisch)、瓦格纳-纳吉(Beáta Wagner-Nagy),2024年。《INEL涅涅茨语语料库(INEL Nenets Corpus)》,版本1.0,发布日期2024年12月31日。https://hdl.handle.net/11022/0000-0007-FE37-E。该语料库存档于汉堡大学,收录于《北欧欧亚原住民语言INEL语料库》。https://hdl.handle.net/11022/0000-0007-F45A-1 **语料库描述** INEL涅涅茨语语料库依托2016—2033年的长期INEL项目“北欧欧亚原住民语言的语法描述、语料库与语言技术(Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages)”构建而成。 本语料库收录了1940年至2011年间录制的两种涅涅茨语变体——森林涅涅茨语与苔原涅涅茨语。本语料库中的大部分文本源自已出版著作,相关元数据(Metadata)章节已对其进行规范引用。具体使用的出版物详见文档的参考文献部分,所涉文献包括: Barmich 2018、Burkova 2008、Burkova 2012、Burkova et al. 2003、Hajdú 1968、Koshkareva et al. 2007、Labanauskas 2001、Logany & Logany 2016、Lyubinskaya 2022、Pusztay 1976、Tereshchenko 1956、Tereshchenko 1990、Turutina 2003、Yangasova 2018 斯韦特兰娜·布尔科娃(Svetlana Burkova)慷慨分享了其收集的森林涅涅茨语语料,包括原始录音(阿甘方言)、以Toolbox文件与Word文档形式存储的转写与标注(覆盖阿甘与普尔方言),以及普尔方言(Turutina 2003)与努姆托方言(Logany & Logany 2016)的已出版文本。 本语料库中的所有文本均附带逐语素行内对齐标注与英、德、俄三种语言的译文本。其中1篇文本附带音频录音。 **语料库规模** - 森林涅涅茨语:**80**篇文本,3709个句子,**23597**个词元(Token) - 苔原涅涅茨语:**56**篇文本,**6545**个句子,**37681**个词元(Token) - 总计:**136**篇文本,**10254**个句子,**61278**个词元(Token) - 总音频时长:**44分45秒** **资助信息** 本语料库的制作依托德国联邦政府与联邦州联合开展的“科学院计划”资助项目,资助方为德国联邦教育与研究部以及自由汉萨同盟汉堡市。该科学院计划由德国科学院与人文科学院联合会统筹协调。 **语料库检索** 用户可通过提供的链接从ZFDM知识库下载本语料库,并可使用EXMARaLDA软件或ELAN软件进行本地浏览与检索。 可通过Tsakorpus平台在线检索,访问地址为https://inel.corpora.uni-hamburg.de/NenetsCorpus/search。 无需下载全部语料文件,亦可通过EXMARaLDA进行远程检索(详见https://inel.corpora.uni-hamburg.de/portal/help/en/index.php)。 有关转写、标注层级与标注标签的详细说明,请参阅用户文档第3章节。如需更多信息及相关链接,请访问INEL资源门户内的涅涅茨语语料库页面:https://inel.corpora.uni-hamburg.de/portal/corpora/nenets/。
提供机构:
Universität Hamburg
创建时间:
2024-12-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作