Classifying HSC sources using machine learing

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/10628176

下载链接

链接失效反馈

官方服务：

资源简介：

I used 73.6 million photometric sources from the Hyper Suprime-Cam (HSC). I used 201,490 matched sources (stars: 8,500, galaxies: 164,905, quasars: 28,085) with spectroscopically labelled sources from the Sloan Digital Sky Survey (SDSS) to train an optimised random forest classifier. The performance metric (F1 score) scores relatively high across all classifications (stars: 0.935, galaxies: 0.991, quasars: 0.937). I applied the trained model to previously unlabelled sources from the HSC photometric catalogue. This resulted in individual classification probabilities for each source, with 59% of galaxies, 10% of quasars, and 56% of stars having classification probabilities greater than 0.9. Finally I used a non-linear dimension reduction technique, Uniform Manifold Approximation and Projection (UMAP), in fully-supervised schemes to visualise the separation of galaxies, quasars, and stars in a two-dimensional space. File descriptions: All files are Pandas DataFrames. `SDSS_spec_xmwise_all.pkl` contains the spectroscopically observed sources from the SDSS which is used in the reference. The compressed csv files with the extension `csv.gz` contains photometrically observed sources from the HSC. They are named using the following convention `__ra_`. Please place these files into the `HSC_sources` directory. `misclassified_sources.pkl` contains sources that were misclassified by the model. Stars : S Galaxies : G Quasars : Q(Missed S as Q:310, as G:48 Missed G as S:98, as Q:382 Missed Q as G:961, as S:81)

创建时间：

2024-02-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集