five

LightGBM-SHAP Time Series and Text Dataset

收藏
DataCite Commons2025-12-02 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=ea8180edd5a644c68394c395309fb78f
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset used in this study is sourced from the Chinese university MOOC platform, and the DeepSeek big model is used to collect multimodal learning behavior data. A total of 40517 original samples were initially obtained, and after filling in missing values and removing outliers, 40000 valid samples were retained, with a data validity rate of 98.7%. The sample learning courses cover 10 main disciplines including computer science, art and design, introduction to artificial intelligence, principles of statistics, psychology, etc., with good interdisciplinary breadth and content representativeness. Select the top three high engagement courses under the "Certified Selection" category as the main subject to ensure the quality and comparability of course data.  The course categories have undergone unified text standardization and numerical encoding processing. The collected learning data includes two types of modal information: (1) comment text data: derived from course evaluation, reflecting students' subjective evaluations of course content, teacher performance, and platform services; (2) Time series data: including key behavioral time series features such as time spent, total test score, learning frequency, and course completion rate. To ensure the ethical compliance and privacy protection of data, all user identities and course names are anonymized and uniformly replaced with "a certain MOOC course". In the data preprocessing stage, blank comments, system default text, garbled records, and abnormal behavior samples were removed. The BERT pre training model was used to extract sentiment polarity, sentiment intensity, and contextual semantic vectors from the comment text. Subsequently, by integrating text and time series features, a unified input vector is constructed for subsequent modeling and analysis.
提供机构:
Science Data Bank
创建时间:
2025-12-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作