romi2001/student-performance-analysis
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/romi2001/student-performance-analysis
下载链接
链接失效反馈官方服务:
资源简介:
## Presentation Video
<video src="https://huggingface.co/datasets/romi2001/student-performance-analysis/resolve/main/presentation_video.mp4" controls="controls" style="max-width: 720px;"></video>
# Student Performance Factors — EDA Project
## Overview
The goal of this project is to predict which lifestyle and academic factors most influence a student's final exam score. Understanding these factors can help students, educators, and parents make better decisions to improve academic outcomes.
## Dataset Description
**Source** Kaggle — Student Performance Factors (https://www.kaggle.com/datasets/lainguyn123/student-performance-factors)
**Author** lainguyn123
**Size** 6,607 rows × 20 features
**Target Variable** Exam_Score — final exam score (0–100)
**Numeric features:** Hours_Studied, Attendance, Sleep_Hours, Previous_Scores, Tutoring_Sessions, Physical_Activity, Exam_Score
**Categorical features:** Gender, Parental_Involvement, Access_to_Resources, Extracurricular_Activities, Motivation_Level, Internet_Access, Family_Income, School_Type, Peer_Influence, Learning_Disabilities
## Exploratory Data Analysis (EDA)
### 1. Data Cleaning & Handling
**Dropped columns:**
Three categorical columns with missing values were dropped: Teacher_Quality, Parental_Education_Level, and Distance_from_Home.
Reason: The analysis focuses on numeric lifestyle and academic factors. These columns didn't fit the numeric correlation approach, and dropping them avoids introducing fake patterns through mode imputation.
**Missing values:**
After dropping the three columns above, no missing values remained in the dataset.
**Duplicate entries:**
No duplicates were found in the data.
**Typos / impossible values:**
One student had an Exam_Score of 101, which is outside the valid 0–100 range. This was corrected to 100, treating it as a data entry error for a perfect score. The student's remaining data was preserved in the analysis.
### 2. Outlier Detection & Handling
All outliers were identified using boxplots and retained in the dataset.
**Hours Studied:**

Outliers were identified but kept in the final dataset.
Reason: Students with only 7 study hours scored 88, the same as students with 30 hours. These represent quick learners. Removing them would hide the fact that study hours isn't the only way to succeed.
**Attendance:**

Outliers were identified but kept in the final dataset.
Reason: A student with only 67% attendance scored 95, while a student with 87% attendance scored 60. These represent independent learners for whom attendance alone does not determine performance.
**Sleep Hours:**

Outliers were identified but kept in the final dataset.
Reason: Students sleeping only 4 hours scored 100, while students sleeping 10 hours scored 57. These represent students sacrificing sleep to study more or less sleep hours alone do not determine performance.
**Previous Scores:**

Outliers were identified but kept in the final dataset.
Reason: A student with a previous score of 54 managed to score approximately 95. He represents students with academic improvment. Removing them would hide the fact that a student's past grades do not determine their future performance.
**Tutoring Sessions:**

Outliers were identified but kept in the final dataset.
Reason: Students with 0 tutoring sessions managed to score 98.These represent students who are independent learners (are capable of mastering the material on their own without external help). Removing them would hide the fact that tutoring sessions isn't the only way to succeed.
**Physical_Activity:**

Outliers were identified but kept in the final dataset.
Reason: A student who does one physical activity managed to score 97. He represents students whom physical activity isn't the primary factor for success. Removing them would hide the fact that physical activity isn't the only way to succeed.
### 3. Descriptive Statistics
**Average student profile:**

Hours Studied - 20 hours a week
Attendance - 80%
Sleep Hours - 7 hours
Previous Scores - 75
Tutoring Sessions - 1
Physical Activity - 3 hours a week
**Exam Score** **67**
**Correlation matrix (numeric features):**

Attendance and Hours_Studied have the strongest positive correlation with Exam_Score.
### 4. Visualizations
**Correlation Heatmap:**

Instead of looking at different charts, a heatmap lets us compare everything at once. It uses color to show exactly which variables move together.
Attendance and Hours_Studied have the strongest positive correlation with Exam_Score.
Sleep_Hours has a weak negative relationship, showing that students who sleep more hours get a lower Exam_Score.
**Histogram — Distribution of Exam Scores:**

Most students cluster around a typical score of 67.
**Scatter Plot — Study Hours vs. Exam Score:**

There is an upward trend between Hours_Studied and Exam_Score, though the spread of dots shows that hours alone don't guarantee a top score.
**Boxplot — Attendance vs. Exam Score:**

Higher attendance consistently pulls the entire score distribution upward.
### 5. Research Questions & Answers
**Q1: Does motivation level affect exam scores?**

No.
The median exam score is approximately 67 across all motivation levels (Low, Medium, High). The middle line for all three groups sits around 67, meaning the average student scores 67 regardless of motivation level — motivation is not a deciding factor.
**Q2: Does parental involvement matter for exam scores?**

Minimally.
The median rises by roughly 1 point from Low to High parental involvement. Parental involvement is a supporting factor, not a deciding one. The median goes up, but by only one point, so it isn't a major game changer.
**Q3: Does physical activity affect academic performance?**

Slightly.
The regression line moves slightly upward as weekly activity increases from 0 to 6 hours, suggesting a very small positive relationship. However, students with 0 hours scored 80 and students with 6 hours scored 60. Exercise might help a little, but it isn't a guarantee of a high grade.
**Q4: Do access to educational resources matter?**

Yes, as a booster.
The median drops slightly from High to Low access. Students with high access also have a larger cluster of outliers reaching the top grades. Access to resources provides a small advantage, especially for hitting the higher scores.
**Q5: Does gender affect exam performance?**

No.
Both male and female bars are exactly the same height. The average student scores 67 regardless of gender. Gender has zero impact on academic success in this dataset.
### 6. Insights
1. **Attendance and study hours** are the two strongest predictors of exam performance.
2. **Motivation, parental involvement, gender, and physical activity** have little to no direct impact on final scores.
3. **Access to educational resources** provides a small but real advantage, particularly for reaching higher scores.
4. **Outliers are meaningful** — many high/low scores resulted from individual learning styles, not data errors.
5. The average student scores **67**, studies **20 hours/week**, attends **80%** of classes, and sleeps **7 hours/night**.
**Overall:**
Exam performance is primarily driven by two behavioral factors: how much a student studies and how consistently they attend class.
Factors like motivation, gender, and parental involvement play a much smaller role than commonly assumed.
The data suggests that consistent effort and presence are the most reliable paths to academic success in this dataset.
提供机构:
romi2001



