Classifying HSC sources using machine learing
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10628176
下载链接
链接失效反馈官方服务:
资源简介:
I used 73.6 million photometric sources from the Hyper Suprime-Cam (HSC). I used 201,490 matched sources (stars: 8,500, galaxies: 164,905, quasars: 28,085) with spectroscopically labelled sources from the Sloan Digital Sky Survey (SDSS) to train an optimised random forest classifier. The performance metric (F1 score) scores relatively high across all classifications (stars: 0.935, galaxies: 0.991, quasars: 0.937). I applied the trained model to previously unlabelled sources from the HSC photometric catalogue. This resulted in individual classification probabilities for each source, with 59% of galaxies, 10% of quasars, and 56% of stars having classification probabilities greater than 0.9. Finally I used a non-linear dimension reduction technique, Uniform Manifold Approximation and Projection (UMAP), in fully-supervised schemes to visualise the separation of galaxies, quasars, and stars in a two-dimensional space.
File descriptions:
All files are Pandas DataFrames. `SDSS_spec_xmwise_all.pkl` contains the spectroscopically observed sources from the SDSS which is used in the reference. The compressed csv files with the extension `csv.gz` contains photometrically observed sources from the HSC. They are named using the following convention `__ra_`. Please place these files into the `HSC_sources` directory. `misclassified_sources.pkl` contains sources that were misclassified by the model.
Stars : S Galaxies : G Quasars : Q(Missed S as Q:310, as G:48 Missed G as S:98, as Q:382 Missed Q as G:961, as S:81)
创建时间:
2024-02-13



