idoyaaran/stellar-classification-eda
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/idoyaaran/stellar-classification-eda
下载链接
链接失效反馈官方服务:
资源简介:
<video src="https://huggingface.co/datasets/idoyaaran/stellar-classification-eda/resolve/main/Presentation.mp4" controls="controls" style="max-width: 720px;"></video>
# Stellar Classification: Can we tell what's in space from telescope data?
## Dataset Overview
I chose the **Stellar Classification Dataset (SDSS17)** from Kaggle, based on real data from the Sloan Digital Sky Survey. It contains 100,000 observations of celestial objects with 18 columns, mostly numeric measurements like light filters, sky coordinates, and redshift.
**Main question:** Can we classify whether a celestial object is a **Star**, **Galaxy**, or **Quasar** just from the numbers the telescope recorded?
**Target variable:** `class` (Star, Galaxy, or QSO/Quasar)
---
## Data Cleaning
- **Dropped 9 columns** that were just technical IDs (obj_ID, spec_obj_ID, rerun_ID, run_ID, cam_col, field_ID, plate, MJD, fiber_ID). They describe how the data was collected, not what the object is.
- **Removed rows with -9999 values** in the u, g, z light filters. These are error codes the telescope logs when it can't get a proper reading, not real measurements.
- **Kept high redshift values** because they looked like outliers at first, but they're actually just quasars being far away.
---
## Research Questions & Visualizations
### Q1: How are the 3 classes distributed?

Galaxies are the most common (around 59,000), followed by Stars (around 21,500) and Quasars (around 19,000). The dataset isn't perfectly balanced, with almost 3x more galaxies than quasars.
---
### Q2: How does redshift differ between the 3 classes?


This is the clearest finding:
- **Stars** have redshift close to 0, meaning they're close to us
- **Galaxies** are in the middle, mostly between 0 and 1
- **Quasars** have the highest redshift, going up to 7, meaning they're extremely far away
Redshift alone could probably do a decent job separating the classes.
---
### Q3: Can the light filters help tell the classes apart?

The light filters show differences:
- **Stars** tend to be brightest (lowest values) since they're close to us
- **Quasars** have more even readings across filters
- **Galaxies** fall somewhere in between
The r, i, and z filters show the clearest separation between classes.
---
### Q4: How correlated are the features?

- Light filters (u, g, r, i, z) are highly correlated with each other, especially r, i, z (above 0.92)
- Redshift has a moderate correlation with the filters
- Sky coordinates (alpha, delta) have almost no correlation with anything
---
### Q5: Does location in the sky matter?

All 3 classes are spread evenly across the sky. Location doesn't help us classify objects. What matters is the light measurements and redshift.
---
## Conclusion
The answer to our question is **yes**, the telescope measurements contain enough information to separate Stars, Galaxies, and Quasars:
- **Redshift is the strongest indicator.** Stars sit close to 0, Galaxies in the middle, Quasars the highest.
- **Light filters add useful information.** Stars appear brighter, Quasars have even readings, Galaxies in between.
- **Sky location doesn't matter.** Objects of all types are spread everywhere in the sky.
If I had to guess what the telescope is looking at, I'd first check redshift, then the light filters. Coordinates wouldn't help.
---
## Files
- `star_classification.csv` - the cleaned dataset
- `Stellar_Classification_EDA.ipynb` - full notebook with all the code and analysis
提供机构:
idoyaaran



