five

Classifying HSC sources using machine learing

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10628176
下载链接
链接失效反馈
官方服务:
资源简介:
I used 73.6 million photometric sources from the Hyper Suprime-Cam (HSC). I used 201,490 matched sources (stars: 8,500, galaxies: 164,905, quasars: 28,085) with spectroscopically labelled sources from the Sloan Digital Sky Survey (SDSS) to train an optimised random forest classifier. The performance metric (F1 score) scores relatively high across all classifications (stars: 0.935, galaxies: 0.991, quasars: 0.937). I applied the trained model to previously unlabelled sources from the HSC photometric catalogue. This resulted in individual classification probabilities for each source, with 59% of galaxies, 10% of quasars, and 56% of stars having classification probabilities greater than 0.9. Finally I used a non-linear dimension reduction technique, Uniform Manifold Approximation and Projection (UMAP), in fully-supervised schemes to visualise the separation of galaxies, quasars, and stars in a two-dimensional space.     File descriptions: All files are Pandas DataFrames. `SDSS_spec_xmwise_all.pkl` contains the spectroscopically observed sources from the SDSS which is used in the reference. The compressed csv files with the extension `csv.gz` contains photometrically observed sources from the HSC. They are named using the following convention `__ra_`. Please place these files into the `HSC_sources` directory. `misclassified_sources.pkl` contains sources that were misclassified by the model. Stars : S     Galaxies : G    Quasars : Q(Missed S as Q:310,  as G:48       Missed G as S:98, as Q:382      Missed Q as G:961, as S:81)
创建时间:
2024-02-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作