five

Khanty Corpus (North Khanty, Corpora and Translations) (UHLCS)

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://etsin.fairdata.fi/dataset/44c0f561-f8da-49f6-9598-f3db955f019c
下载链接
链接失效反馈
官方服务:
资源简介:
The corpus is available in Kielipankki - the Language Bank of Finland (puhti.csc.fi, access rights instructions: http://www.kielipankki.fi/access). Location: /appl/data/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/ugric-lgs/khanty The Khanty computer corpus contains the following sub-corpora: Khanty, Atlym dialect, 519 words, 3967 characters Khanty, Kazym dialect, 62766 words, 585659 characters Khanty, Konda dialect, 1115 words, 10234 characters Khanty, Nizjam dialect, 17681 words, 259732 characters Khanty, Obdorsk dialect, 10939 words, 200358 characters Khanty, Synja dialect, 10939 words, 200358 characters. The corpora of the Khanty dialects are samples taken from the following text collections: Rédei, Károly (1968). Nord-ostjakische Texte (Kazym-Dialekt) mit Skizze der Grammatik. Gesammelt und herausgegeben von Károly Rédei. Abhandlung der Akademie der Wissenschaften in Göttingen, philologisch-historische Klasse, dritte Folge 71. Göttingen. Steinitz, Wolfgang (1989). Ostjakologische Arbeiten III. Texte aus dem Nachlass. Eds.: Hartung, Liselotte, Hauel, Petra, Sauer, Gert & Schulze, Birgitte. Janua Linguarum, Series Practica 256. Mouton de Gruyter, Berlin. Vértes, Edith (1980). H. Paasonens südostjakische Textsammlungen. Suomalais-Ugrilaisen Seuran Toimituksia 175. Suomalais-Ugrilainen Seura, Helsinki. The corpora are running texts and several corpora are morphologically analyzed. Morphologically encoded words of the texts are in the word-per-line format, and the plain texts are in sentence-per-line format. There are also texts in which the clauses and the sentences are marked with the information about the location of the sentences in the texts. Khanty, Textbook: Rugin, R.P. (1990). Shum jôxan sjun'öng xâtLöt. (Shchastlivye den'ki na Shum-jugane.) [Onnellisia päiviä Shum-joella.] Kniga dlja dopol'nitel'nogo chtenija v 3-4 klassax xantyjskix shkol (shuryshkarskij dialekt). Prosveshchenie, Leningrad. The text includes six different versions: (1) one version edited in the original form by using the Cyrillic alphabet; (2) the same text as transformed to the Latin alphabet; the same text as translated into (3) Finnish, (4) English and (5) Russian, and (6) the original text in the Latin format as morphologically coded and translated into English. Children's books: Life of Jesus in Khanty (the Kazim dialect). (Trial edition). Translation: Nyomysova, Yevdokiya Andreyevna & Lozyamova, Zoya Nikiforovna. ISBN 952-9790-25-2, ISBN 91-88394-97-2. 63 pp. Institute for Bible Translation. Stockholm & Helsinki 1995. Life of Jesus in Khanty (the Kazim dialect). (Second edition). Translation: Nyomysova, Yevdokiya Andreyevna & Lozyamova, Zoya Nikiforovna. ISBN 952-9790-40-6, ISBN 91-88794-83-0. 63 pp. Institute for Bible Translation. Stockholm & Helsinki 1997. The computer corpora on the Khanty dialects, and the textbook were compiled and edited by Merja Salo with the financial support of the Academy of Finland. The adaptation of the texts for public use was done with the financial support of the Department of General Linguistics, University of Helsinki. The books of children were donated to the University of Helsinki by the Institute for Bible Translation, Helsinki and Stockholm. The Khanty Corpus is a part of the UHLCS corpus collection. UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (suihkonen.pirkko@gmail.com). License details: http://urn.fi/urn:nbn:fi:lb-20150304115 Detailed information: http://urn.fi/urn:nbn:fi:lb-2014060214 http://www.ling.helsinki.fi/uhlcs/metadata/corpus-metadata/uralic-lgs/ugric-lgs/khanty The purpose of the resource use must be outlined in a research plan. log 25.11.2018 link http://islrn.org/resources/156-041-809-270-6 removed

本语料库可从芬兰语言银行Kielipankki(访问地址:puhti.csc.fi,访问权限说明:http://www.kielipankki.fi/access)获取,存储路径为:/appl/data/kielipankki/mrc-uhlcs/multilingual-language-archive/uralic-lgs/finno-ugric-lgs/ugric-lgs/khanty。 汉特语(Khanty)计算机语料库包含以下子语料库: 1. 阿特林(Atlym)方言汉特语:519词,3967字符 2. 卡济姆(Kazym)方言汉特语:62766词,585659字符 3. 孔达(Konda)方言汉特语:1115词,10234字符 4. 尼扎姆(Nizjam)方言汉特语:17681词,259732字符 5. 奥布多尔斯克(Obdorsk)方言汉特语:10939词,200358字符 6. 辛亚(Synja)方言汉特语:10939词,200358字符 上述汉特方言语料均取自以下文本集: 1. 雷代伊·卡罗利(Károly Rédei)(1968)。《东北奥斯恰克语文本(卡济姆方言)附语法概要》,由Károly Rédei收集并编辑,载于《哥廷根科学院院刊:语文学-历史类》第三辑第71卷,哥廷根。 2. 施泰尼茨·沃尔夫冈(Wolfgang Steinitz)(1989)。《奥斯恰克学研究Ⅲ:遗稿文本》,由Hartung, Liselotte、Hauel, Petra、Sauer, Gert与Schulze, Birgitte编辑,载于《语言之门:实践系列》第256卷,柏林:Mouton de Gruyter。 3. 韦尔特斯·伊迪丝(Edith Vértes)(1980)。《H. 帕奥森东南奥斯恰克语文本集》,载于《芬兰-乌戈尔学会论丛》第175期,赫尔辛基:芬兰-乌戈尔学会(Suomalais-Ugrilainen Seura)。 本语料库包含连续文本,其中部分子语料库已完成形态分析。文本的形态编码词采用逐词行格式(word-per-line format)存储,纯文本则采用逐句行格式(sentence-per-line format)存储。部分文本还标注了分句与句子信息,并附带了其在原文中的位置标识。 ### 汉特语教材 鲁金·R·P(R. P. Rugin)(1990)。《舒姆河上的快乐时光》(俄文原标题:*Shchastlivye den'ki na Shum-jugane*;芬兰文原标题:*Onnellisia päiviä Shum-joella*),供汉特语学校3-4年级课外阅读使用(舒雷什卡(Shuryshkar)方言),列宁格勒:教育出版社(Prosveshchenie)。 该教材包含6个不同版本: 1. 采用西里尔字母编辑的原始版本; 2. 转换为拉丁字母的同一文本; 3. 芬兰语译本; 4. 英语译本; 5. 俄语译本; 6. 经形态编码的拉丁格式原始文本及英语译本。 ### 儿童读物 1. 《耶稣生平》(卡济姆方言汉特语版,试印本),译者:尼奥莫索娃·叶夫多基娅·安德烈耶夫娜(Nyomysova, Yevdokiya Andreyevna)与洛扎莫娃·卓娅·尼基福罗夫娜(Lozyamova, Zoya Nikiforovna),ISBN:952-9790-25-2、91-88394-97-2,共63页,斯德哥尔摩与赫尔辛基:圣经翻译学会(Institute for Bible Translation),1995年。 2. 《耶稣生平》(卡济姆方言汉特语版,第二版),译者同上,ISBN:952-9790-40-6、91-88794-83-0,共63页,斯德哥尔摩与赫尔辛基:圣经翻译学会,1997年。 ### 编纂与资助信息 本汉特方言语料库及上述教材由Merja Salo编纂整理,获得芬兰科学院(Academy of Finland)的经费支持。文本面向公众使用的适配工作由赫尔辛基大学普通语言学系资助完成。儿童读物由赫尔辛基与斯德哥尔摩的圣经翻译学会捐赠给赫尔辛基大学。 本汉特语语料库是UHLCS语料库集的组成部分,UHLCS拥有多位不同的知识产权持有人。若您对该语料库有任何疑问,请联系Pirkko Suihkonen(邮箱:suihkonen.pirkko@gmail.com)。 授权详情:http://urn.fi/urn:nbn:fi:lb-20150304115 详细信息:http://urn.fi/urn:nbn:fi:lb-2014060214、http://www.ling.helsinki.fi/uhlcs/metadata/corpus-metadata/uralic-lgs/ugric-lgs/khanty 资源使用用途需在研究计划中说明。 日志记录:2018年11月25日,链接http://islrn.org/resources/156-041-809-270-6 已移除。
创建时间:
2024-01-31
二维码
社区交流群
二维码
科研交流群
商业服务