HSK1-9 dataset level grading model
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/mmch43zw7y
下载链接
链接失效反馈官方服务:
资源简介:
This study compiles its dataset from texts sourced from the HSK Standard Course and its accompanying exercises, New HSK Examination Sample Tests, New HSK Actual Test Papers, Developing Chinese (Third Edition), Chinese Paradise, Enjoy Reading: International Chinese Reading Teaching Textbook, New Standard College Chinese (NSCC) Series, and K-12 Standard Chinese. A brief description of these resources follows:
• 《HSK Standard Course》: This is an officially authorized textbook series published by Beijing Language and Culture University Press (BLCUP), developed in collaboration with Chinese Testing International (CTI). First published in November 2013, it is designed for learners preparing for the HSK tests and comprehensively covers listening, speaking, reading, and writing skills;
• 《New HSK Sample Tests》and《New HSK Actual Test Papers》:
The New HSK Examination Sample Tests provides one official mock test for each level from HSK 1 to 6. The New HSK Examination Past Papers contain actual past exam papers.
• 《K-12 Standard Chinese》: Published by BLCUP in December 2024, this series is designed for K-12 non-native learners. It aligns with the "three stages and nine levels" of the Chinese Proficiency Grading Standards but refines them into a "five phases and nine levels" system for the K-12 context;
• 《New Standard College Chinese (NSCC) Series》: Published by BLCUP in 2024, this series is tailored for undergraduate students. It includes textbooks for elementary (HSK 1-3), intermediate (HSK 4-6), and advanced (HSK 7-9) levels.
• 《Enjoy Reading》(ER): Published by BLCUP in 2022 and edited by Su Yingxia, this six-volume set is designed for learners with a foundation in Chinese, focusing on intensive, extensive, and authentic reading skill development.
• 《Developing Chinese (3rd Edition)》: A comprehensive teaching system published by BLCUP between 2024 and 2025, catering to various course types and learner levels.
• 《Chinese Paradise》: A textbook series for non-heritage primary school students, published by BLCUP in November 2023, covering overseas grades 1-6.
Using the CnOCR toolkit, we extracted lesson titles, main text, and new words from the textbooks, and reading passages from workbooks and test papers, excluding very short articles. All extracted content was stored in TXT format.
After preliminary manual screening and cleaning, a total of 6,269 texts were obtained. The compilers of these source materials had pre-assigned difficulty levels (1-9) to the texts based on the Chinese Proficiency Grading Standards for International Chinese Language Education.
创建时间:
2026-01-22



