five

H1 Children's Writing

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2016T01
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>H1 Children's Writing was developed by the <a href="http://www.dhbw.de/english/dhbw/about-us.html">Cooperative State University Baden-W&uuml;rttemberg</a>, <a href="http://www.ph-karlsruhe.de/index.php?id=5829">University of Education</a>. It consists of 996 texts written over three months by 88 German school children age seven through eleven years.</p><br> <p>The data in this corpus was collected by an elementary school in Baden W&uuml;rttemberg, Germany and digitized at the Cooperative State University during the second half of the 2014/2015 school year. Three second and third grade classrooms participated in the collection.</p><br> <p>Texts were written within regular class settings. The students were presented with a picture and were asked to write a story, to describe the picture or if unable to write a text, to list what they saw in the picture. The pictures were designed to enhance the output with respect to important spelling error categories, namely, the marking of short vowels with a silent consonant letter and the correct spelling of the long vowel. The children were allowed at least 15 minutes to write the texts. This exercise was repeated weekly for 12 weeks.</p><br> <p>LDC has also released&nbsp;H2, E2, ERK1 Children's Writing (<a href="../../../LDC2018T05">LDC2018T05</a>).</p><br> <h3>Data</h3><br> <p>Most of the participants were multilingual. Out of 85 children for whom metadata is available, 57 students were multilingual speakers and 28 students were monolingual German speakers. The following metadata is included for each text in the database: school week of collection; school type (always elementary school); age; gender; grade/classroom; language spoken at home; and school materials used for German (Jojo).</p><br> <p>In all, 996 texts representing 62,764 tokens were collected. The texts were digitized in two forms: (1) the original text, including all errors (achieved), and (2) the intended (target) text, where all spelling errors were removed. Annotations were added to both the achieved text and the target text to distinguish words that should not be analyzed for spelling errors, such as names or foreign words. For sentence-level analysis, syntax errors were annotated by marking substitutions, deletions and insertions at the word level. In such cases, the used word was analyzed for spelling, and the correct word was used for sentence structure analysis.</p><br> <p>Original handwriting is presented as pdf documents and the converted text as UTF-8 plain text in csv documents.</p><br> <h3>Samples</h3><br> <p>Please view the following samples:</p><br> <ul><br> <li><a href="desc/addenda/LDC2016T01.pdf">Handwriting Sample</a></li><br> <li><a href="desc/addenda/LDC2016T01.orig.txt">Original Transcription</a></li><br> <li><a href="desc/addenda/LDC2016T01.corr.txt">Corrected Transcription</a></li><br> </ul><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2016 Kay Berkling, Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作