VepKar
收藏arXiv2022-06-08 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2206.03870v1
下载链接
链接失效反馈官方服务:
资源简介:
VepKar数据集是由卡累利阿研究中心语言学、文学和历史研究所创建的多语言资源,专注于Veps和Karelian语言的研究。该数据集包含3000篇文本,涵盖多种类型和风格,如民间文学、法律文本等,旨在全面展示19世纪至21世纪这两种语言的状态。创建过程中,数据集通过严格的文本选择、数字化和标记化处理,确保了数据的质量和可用性。VepKar数据集的应用领域广泛,不仅支持语言学研究,还涉及教育、文化保护等多个方面,致力于解决语言多样性和文化传承的问题。
The VepKar Dataset is a multilingual resource developed by the Institute of Linguistics, Literature and History of the Karelian Research Centre, focusing on research into the Veps and Karelian languages. This dataset contains 3,000 texts spanning diverse genres and styles, such as folk literature and legal documents, aiming to comprehensively present the linguistic status of these two languages from the 19th to the 21st century. During its creation, the dataset underwent strict text selection, digitization and tokenization processing to ensure data quality and usability. The VepKar Dataset has a wide range of application scenarios, supporting not only linguistic research but also multiple fields including education and cultural preservation, and is committed to addressing issues related to linguistic diversity and cultural heritage.
提供机构:
卡累利阿研究中心语言学、文学和历史研究所
创建时间:
2022-06-08



