智慧课堂场景数据集

Name: 智慧课堂场景数据集
Creator: 电子科技大学
License: 暂无描述

国家基础学科公共科学数据中心2025-12-20 收录

下载链接：

https://nbsdc.cn/general/dataDetail?id=6942d3a6195d2666dedea738&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

ARIC是由电子科技大学智能视觉信息处理与通信实验室（IVIPC Lab）构建并发布的一个专门用于课堂行为识别的多模态数据集。随着“AI+教育”领域的快速发展，课堂行为识别受到广泛关注。然而，现有的研究大多基于人工拍摄的视频或仅包含有限的行为类别，缺乏基于真实课堂监控视角的数据集。真实监控场景面临着类别不平衡（长尾分布）、高相似度干扰以及隐私保护等多重挑战。而ARIC数据集不仅提供了真实的监控视角，还针对连续学习和少样本连续学习设计了专门的任务设置，旨在促进开放教学场景下的持续行为分析与研究。该数据集基于电子科技大学真实智慧教室观测站产生。数据采集使用了多视角（前、中、后）4K高清监控摄像头，以获取不同角度的高质量监控画面，从而减少遮挡和视角变化带来的识别误差。在内容上，ARIC包含图像、音频、文本三种模态。图像模态由原始视频中提取帧，并对师生的个体行为进行了标注。音频模态通过截取图像前后共10秒的音频片段构成，确保包含完整的语句信息。文本模态是通过利用开源大模型InternVL生成的对图像场景和行为细节的详细文本描述。为了保护隐私，数据集主要发布经过预训练模型（如ResNet50, ViT, CLIP-VIT）提取的浅层特征，而非由于隐私原因可能受限的全部原始高通过率人脸图像。主要内容与体量： ARIC数据集涵盖了32类详细的课堂活动，既包括常见的“听讲”、“阅读”、“看手机”，也包括样本较少的“举手”、“吃东西”等长尾行为。数据量包含36,453张监控图像样本及其对应的多模态数据，总大小约874GB。此外，数据集还提供了标准化的连续学习划分设置（如8+6×4等模式），以模拟真实场景中新行为不断出现的情况。

ARIC is a multimodal dataset specifically for classroom behavior recognition, constructed and released by the Intelligent Visual Information Processing and Communication Lab (IVIPC Lab) of the University of Electronic Science and Technology of China. With the rapid development of the "AI+Education" field, classroom behavior recognition has attracted widespread attention. However, most existing studies are based on manually captured videos or only include limited behavior categories, lacking datasets from the perspective of real classroom surveillance. Real surveillance scenarios face multiple challenges such as class imbalance (long-tailed distribution), high-similarity interference, and privacy protection. The ARIC dataset not only provides real surveillance perspectives, but also designs specialized task settings for continuous learning and few-shot continuous learning, aiming to promote continuous behavior analysis and research in open teaching scenarios. This dataset is generated based on the real smart classroom observation stations of the University of Electronic Science and Technology of China. Data collection uses multi-view (front, middle, rear) 4K high-definition surveillance cameras to obtain high-quality surveillance footage from different angles, thereby reducing recognition errors caused by occlusion and viewpoint changes. In terms of content, ARIC includes three modalities: image, audio, and text. The image modality consists of frames extracted from the original video, with individual behaviors of teachers and students annotated. The audio modality is constructed by intercepting 10-second audio segments before and after the corresponding image, ensuring that complete sentence information is included. The text modality consists of detailed textual descriptions of image scenes and behavioral details generated by the open-source large language model InternVL. To protect privacy, the dataset mainly releases shallow features extracted by pre-trained models (such as ResNet50, ViT, CLIP-ViT) instead of all original high-resolution facial images that may be restricted due to privacy concerns. Main content and scale: The ARIC dataset covers 32 detailed classroom activity categories, including common ones such as "listening", "reading", and "using mobile phones", as well as long-tailed behaviors with fewer samples such as "raising hands" and "eating". The dataset contains 36,453 surveillance image samples and their corresponding multimodal data, with a total size of approximately 874 GB. In addition, the dataset also provides standardized continuous learning partition settings (such as the 8+6×4 mode) to simulate the scenario where new behaviors continuously emerge in real-world settings.

提供机构：

电子科技大学

搜集汇总

数据集介绍

背景与挑战

背景概述

智慧课堂场景数据集（ARIC）是由电子科技大学构建的多模态数据集，专门用于课堂行为识别，包含图像、音频和文本三种模态数据，涵盖32类课堂活动，总数据量约874GB。该数据集基于真实课堂监控视角，针对连续学习和少样本学习设计，旨在促进开放教学场景下的持续行为分析研究。

以上内容由遇见数据集搜集并总结生成