five

Slovene learner corpus KOST 1.0

收藏
SSH Open MarketPlace2023-10-17 更新2024-08-03 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/TqqrpC
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus of Slovene as a foreign language KOST (Korpus slovenščine kot tujega jezika) contains 6,311 texts (just over 1 million words) written by adult speakers for whom Slovene is not their first language. This corpus offers insights into Slovene language as produced by those who are still learning it as a second or foreign language, and in particular into the most common errors that occur in this process. KOST therefore aims at all those working with Slovene as a second or foreign language. The texts were mainly written at lectorates and Slovene as a L2/FL courses. Most of the authors of these texts speak Serbian, Bosnian and Macedonian as their first language, but texts by speakers of other languages are also included. The authors are at different proficiency levels in Slovene, from beginners to advanced. For each contributor, information is available on gender, year of birth, country, first language and other languages they speak, employment status and education, and prior experience of learning Slovene. For each text, there is also information on the time and circumstances of creation (exam or homework), the programme in which it was produced, input type (digital or hand-written), language level and the grade. A part of the corpus has also texts available in their corrected version which can be access also through concordancers ([noSketchEngine](https://www.clarin.si/ske/#dashboard?corpname=kost10_corr) or [KonText](https://www.clarin.si/kontext/query?corpname=kost10_corr)). The tokens of the original and corrected texts are linked (one group of link per paragraph) and the links categorised into 23 error types. The corpus is available for download from the Slovenian repository CLARIN.SI and can be queried through noSketchEngine and KonText concordancers.
创建时间:
2023-10-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作