five

Chinese Instructional Behavior Dataset for Autonomy-Supportive Teaching Detection

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/instructional-behavior-dataset-autonomy-supportive-teaching-detection
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is the first dedicated one for deep learning-based instructional behavior research, developed to support the automated detection of autonomy-supportive teaching behaviors. Its raw data comes from 23.4 hours of in-class audio recordings of teachers across 8 subjects (including computer science, mathematics, psychology, medicine, etc.) at a Chinese university. Through Automatic Audio Segmentation (AAS) and Automatic Speech Recognition (ASR) technologies, the audio data was converted into 5203 Chinese sentence-level texts.  \r\n\r\nSubsequently, two trained psychologists independently annotated these texts based on the \autonomy-supportive vs. controlling\ instructional behavior classification framework from educational psychology. After excluding 299 samples with inconsistent annotations, 4904 valid texts were retained, with the Cohen\u2019s Kappa coefficient for annotation consistency reaching 0.818.  \r\n\r\nThe dataset covers 14 instructional behavior labels, categorized into 7 autonomy-supportive behaviors (e.g., \allowing student talking,\ \offering hints,\ \communicating perspective-taking statements\), 6 controlling behaviors (e.g., \uttering directives\/commands,\ \making should\/ought to statements,\ \deadline statements\), and 1 neutral behavior (\teacher talk,\ referring to knowledge-focused speech without autonomy-supportive or controlling attributes). A notable characteristic of the dataset is its severe class imbalance, with the \teacher talk\ class accounting for 4180 samples (the largest class), and the imbalance factor (ratio of the largest class to the smallest class) exceeding 200.

本数据集是首个专为基于深度学习的教学行为研究打造的专用数据集,旨在支持自主支持型教学行为的自动化检测。其原始数据来自国内某高校8门学科(涵盖计算机科学、数学、心理学、医学等)教师的23.4小时课堂音频录音。通过自动音频分割(Automatic Audio Segmentation,AAS)与自动语音识别(Automatic Speech Recognition,ASR)技术,该音频数据被转换为5203条中文句子级文本。 随后,两名经过培训的心理学研究者基于教育心理学中的自主支持型与控制型教学行为分类框架,对上述文本开展独立标注。在剔除299份标注不一致的样本后,最终保留4904条有效文本,标注一致性的科恩κ系数(Cohen’s Kappa coefficient)达到0.818。 本数据集涵盖14种教学行为标签,可划分为7类自主支持型行为(例如「允许学生发言」「提供提示」「传递换位思考式表述」)、6类控制型行为(例如「下达指令/命令」「使用应当/理应类表述」「提出截止期限表述」),以及1类中性行为「教师话语」,指不含自主支持或控制属性的知识导向型言语。该数据集的显著特征为严重的类别不平衡问题:其中「教师话语」类别包含4180个样本(为样本量最大的类别),不平衡因子(最大类别与最小类别的比值)超过200。
提供机构:
Chunfu Zhang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作