A Multi-Source Data Warehouse of Academic Student Risk Prediction (Education)

Name: A Multi-Source Data Warehouse of Academic Student Risk Prediction (Education)
Creator: Mendeley Data
Published: 2026-04-21 14:12:29
License: 暂无描述

DataCite Commons2026-04-21 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/8tvbwh3gvb/1

下载链接

链接失效反馈

官方服务：

资源简介：

The ARPS Integrated Dataset is a unified educational dataset developed as the final output of an Academic Risk Prediction System data warehouse. It contains 11,522 student records and 40 feature columns, created by integrating six publicly available datasets collected from Kaggle and the UCI Machine Learning Repository. These datasets represent diverse educational contexts, including learning management system behavioral logs, high school GPA records, secondary school grade datasets, standardized exam scores, and multi-factor student performance data. The integration process was performed through a structured ETL pipeline implemented in Microsoft SQL Server 2022 Express . The dataset is structured as a tabular dataset organized within a star schema in a data warehouse environment. Each row represents a single student and includes a comprehensive profile combining demographic, academic, behavioral, and socioeconomic attributes. The dataset contains a mix of numerical features such as grades, exam scores, study hours, and absences, as well as categorical features such as gender, educational stage, parental education level, and other contextual indicators. A dedicated “source” field is included to trace each record back to its original dataset. After preprocessing and data cleaning, the dataset contains no missing values across all features. The 40 features are organized into seven main groups: identifiers, demographic attributes, academic performance indicators, behavioral metrics, family and social background, study-related characteristics, and environmental factors. These groups collectively provide a holistic view of student conditions, covering both academic outcomes and contextual influences such as family support, access to resources, and lifestyle-related behaviors. The dataset is stored and distributed in multiple formats to ensure compatibility with different analytical environments. It is primarily maintained within SQL Server tables and a unified analytical view, and can also be exported as a UTF-8 encoded CSV file compatible with common tools such as Python, R, and spreadsheet applications. Additionally, it can be accessed programmatically through standard database connectivity interfaces. All underlying data sources are fully anonymized and publicly available. The dataset does not contain any personally identifiable information such as names, IDs, or contact details, ensuring compliance with ethical standards for academic research.

提供机构：

Mendeley Data

创建时间：

2026-04-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集