School Holiday Essay Corpus
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/pybsmy8vfd
下载链接
链接失效反馈官方服务:
资源简介:
The School Holiday Essay Corpus is a linguistic data collection developed to study the use of language in students' writing based on their personal experiences during school holidays. The theme of school holidays was chosen because it is a topic close to students' daily experiences and allows them to write narratively based on the actual activities they experience. In the context of language education, this topic is often used in writing exercises because it provides space for students to express experiences, emotions, and social interactions through language in a more spontaneous and authentic way.
In this research, the School Holiday Essay Corpus was developed as part of a student language data collection project involving 670 student essay texts with a total of 177,027 word tokens (177K tokens). This data was collected from students who came from six geographical zones in Malaysia, namely the North, South, East, West, Sabah and Sarawak Zones. The division of these zones aims to ensure a more balanced geographical representation and to allow researchers to see variations in language use among students from different socioeconomic backgrounds and educational environments.
In addition, the data collection of this corpus also involves students from three levels of education, namely Primary School (around 12 years old), Vocational College (around 16 years old) and Pre-University (around 18 years old). This approach allows for the analysis of language development to be carried out across age levels and educational levels. Through such a data structure, this corpus not only provides an overview of students' language use at a certain level, but also opens up space for broader comparative linguistic studies.
Within the framework of corpus linguistics, the construction of a corpus of student writing provides an opportunity to examine various aspects of language such as lexical diversity, word frequency, spelling errors, sentence formation, as well as the use of metaphors and emotional expressions in student writing. Texts themed around school holidays in particular often contain narratives of experiences involving family activities, travel, community activities and personal experiences. Therefore, these texts provide a rich linguistic context for the analysis of more natural language forms compared to formal texts.
The construction of the School Holiday Essay Corpus also contributes to the development of authentic Malay language data sources in the fields of corpus linguistics and language education. Compared to corpora consisting of formal texts such as newspapers or literary works, the student essay corpus shows more natural language use and reflects the reality of language literacy in an educational context. Therefore, this corpus has the potential to be an important source for studies related to language development, linguistic variation, language errors, and the development of Malay language teaching and learning materials.
创建时间:
2026-03-12



