five

symeneses/merlin

收藏
Hugging Face2024-01-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/symeneses/merlin
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 task_categories: - text-classification language: - de - it - cs pretty_name: MERLIN Written Learner Corpus for Czech, German, Italian 1.1. size_categories: - 1K<n<10K --- # Dataset Card for MERLIN The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains learner texts produced in standardized language certifications covering CEFR levels A1-C1. The MERLIN annotation scheme includes a wide range of language characteristics that provide researchers with concrete examples of learner performance and progress across multiple proficiency levels. ## Dataset Details ### Dataset Description The MERLIN corpus contains 2,286 texts for learners of Italian, German and Czech that were taken from written examinations of acknowledged test institutions. The exams aim to test knowledge across the levels A1-C1 of the Common European Framework of Reference (CEFR). - **Homepage :** https://merlin-platform.eu/ - **Funded by :** The MERLIN project was funded from 2012 until 2014 by the EU Lifelong Learning Programme under project number 518989-LLP-1-2011-1-DE-KA2-KA2MP. - **Shared by :** Since 2018, corpus data are available through the CLARIN network. - **Language(s) (NLP):** Czech, German and Italian - **License:** Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ### Dataset Sources - **Data PID:** https://hdl.handle.net/20.500.12124/6 - **Verion controlled data (Git):** https://gitlab.inf.unibz.it/commul/merlin-platform/data-bundle - **Paper:** Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schöne, K., Štindlová, B., & Vettori, C. (2014). The MERLIN corpus: Learner language and the CEFR. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 14), 26-31 May 2014, 1281–1288. http://www.lrec-conf.org/proceedings/lrec2014/summaries/606.html. ## Uses - Teachers and material writers - Curriculum design and course planning - Language testing For more details and practicla examples, see [use cases](https://www.merlin-platform.eu/C_teacher.php). ## Citation **BibTeX:** @misc{20.500.12124/6, title = {{MERLIN} Written Learner Corpus for Czech, German, Italian 1.1}, author = {Wisniewski, Katrin and Abel, Andrea and Vodi{\v c}kov{\'a}, Kate{\v r}ina and Plassmann, Sybille and Meurers, Detmar and Woldt, Claudia and Sch{\"o}ne, Karin and Blaschitz, Verena and Lyding, Verena and Nicolas, Lionel and Vettori, Chiara and Pe{\v c}en{\'y}, Pavel and Hana, Jirka and {\v C}urdov{\'a}, Veronika and {\v S}tindlov{\'a}, Barbora and Klein, Gudrun and Lauppe, Louise and Boyd, Adriane and Bykh, Serhiy and Krivanek, Julia}, url = {http://hdl.handle.net/20.500.12124/6}, note = {Eurac Research {CLARIN} Centre}, copyright = {Creative Commons - Attribution-{ShareAlike} 4.0 International ({CC} {BY}-{SA} 4.0)}, year = {2018} }
提供机构:
symeneses
原始信息汇总

数据集卡片 for MERLIN

数据集详情

数据集描述

MERLIN 语料库是一个针对捷克语、德语和意大利语的书面学习者语料库,旨在通过真实的学习者数据来说明欧洲共同语言参考框架(CEFR)。该语料库包含学习者在标准语言认证考试中产生的文本,涵盖CEFR的A1-C1级别。MERLIN 标注方案包括广泛的语⾔特征,为研究人员提供跨多个熟练度级别的具体学习者表现和进步的实例。

  • 包含文本数量: 2,286 篇
  • 语言: 捷克语、德语和意大利语
  • 来源: 来自公认的考试机构的书面考试
  • 级别: CEFR 的 A1-C1 级别
  • 许可证: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

数据集用途

  • 教师和教材编写者
  • 课程设计和课程规划
  • 语言测试

引用

BibTeX:

bibtex @misc{20.500.12124/6, title = {{MERLIN} Written Learner Corpus for Czech, German, Italian 1.1}, author = {Wisniewski, Katrin and Abel, Andrea and Vodi{v c}kov{a}, Kate{v r}ina and Plassmann, Sybille and Meurers, Detmar and Woldt, Claudia and Sch{"o}ne, Karin and Blaschitz, Verena and Lyding, Verena and Nicolas, Lionel and Vettori, Chiara and Pe{v c}en{y}, Pavel and Hana, Jirka and {v C}urdov{a}, Veronika and {v S}tindlov{a}, Barbora and Klein, Gudrun and Lauppe, Louise and Boyd, Adriane and Bykh, Serhiy and Krivanek, Julia}, url = {http://hdl.handle.net/20.500.12124/6}, note = {Eurac Research {CLARIN} Centre}, copyright = {Creative Commons - Attribution-{ShareAlike} 4.0 International ({CC} {BY}-{SA} 4.0)}, year = {2018} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作