helenqu/astro-classification-redshifts
收藏Hugging Face2023-10-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/helenqu/astro-classification-redshifts
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- time series
- astrophysics
- pretraining
- connect-later
size_categories:
- 100K<n<1M
---
# AstroClassification and Redshifts Datasets
<!-- Provide a quick summary of the dataset. -->
This dataset was used for the AstroClassification and Redshifts introduced in [Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations](). This is a dataset of simulated astronomical time-series (e.g., supernovae, active galactic nuclei), and the task is to classify the object type (AstroClassification) or predict the object's redshift (Redshifts).
- **Repository:** https://github.com/helenqu/connect-later
- **Paper:** will be updated
- **Point of Contact: Helen Qu (<helenqu@sas.upenn.edu>)**
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
- **object_id**: unique object identifier
- **times_wv**: 2D array of shape (N, 2) containing the observation times (modified Julian days, MJD) and filter (wavelength in nm) for each observation, N=number of observations
- **lightcurve**: 2D array of shape (N, 2) containing the flux (arbitrary units) and flux error for each observation
- **label**: integer representing the class of the object (see below for details)
- **redshift**: redshift of the object
## Dataset Creation
### Source Data
This is a modified version of the dataset from the 2018 Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC) Kaggle competition
The original Kaggle competition can be found [here](https://www.kaggle.com/c/PLAsTiCC-2018). [This note](https://arxiv.org/abs/1810.00001) from the competition describes the dataset in detail. Astronomers may be interested in [this paper](https://arxiv.org/abs/1903.11756) describing the simulations used to generate the data.
- **Train**: 80% of the original PLAsTiCC training set augmented using the redshifting targeted augmentation described in the Connect Later paper
- **Validation**: Remaining 20% of the original PLAsTiCC training set, *not* augmented or modified
- **Test**: Subset of 10,000 objects randomly selected from the PLAsTiCC test set
### Object Types
```
0: microlens-single
1: tidal disruption event (TDE)
2: eclipsing binary (EB)
3: type II supernova (SNII)
4: peculiar type Ia supernova (SNIax)
5: Mira variable
6: type Ibc supernova(SNIbc)
7: kilonova (KN)
8: M-dwarf
9: peculiar type Ia supernova (SNIa-91bg)
10: active galactic nuclei (AGN)
11: type Ia supernova (SNIa)
12: RR-Lyrae (RRL)
13: superluminous supernova (SLSN-I)
14: 5 "anomalous" types that are not present in training set: microlens-binary, intermediate luminosity optical transient (ILOT), calcium-rich transient (CaRT), pair instability supernova (PISN), microlens-string
```
## Citation
will be updated
提供机构:
helenqu
原始信息汇总
AstroClassification and Redshifts Datasets
概述
该数据集用于AstroClassification和Redshifts任务,这些任务在Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations中介绍。数据集包含模拟的天文学时间序列数据(如超新星、活动星系核),任务包括对目标类型进行分类(AstroClassification)或预测目标的红移(Redshifts)。
数据集结构
- object_id: 唯一对象标识符
- times_wv: 形状为(N, 2)的二维数组,包含每个观测的观测时间和滤波器(波长,单位为纳米),N为观测次数
- lightcurve: 形状为(N, 2)的二维数组,包含每个观测的流量(任意单位)和流量误差
- label: 表示对象类别的整数(详见下文)
- redshift: 对象的红移
数据集创建
源数据
该数据集是2018年光度LSST天文时间序列分类挑战(PLAsTiCC)Kaggle竞赛数据集的修改版本。
- 训练集: 原始PLAsTiCC训练集的80%,使用Connect Later论文中描述的红移目标增强进行增强
- 验证集: 原始PLAsTiCC训练集的剩余20%,未增强或修改
- 测试集: 从PLAsTiCC测试集中随机选择的10,000个对象的子集
对象类型
0: 微透镜-单 1: 潮汐破坏事件(TDE) 2: 食双星(EB) 3: II型超新星(SNII) 4: 特殊Ia型超新星(SNIax) 5: 米拉变星 6: Ibc型超新星(SNIbc) 7: 千新星(KN) 8: M矮星 9: 特殊Ia型超新星(SNIa-91bg) 10: 活动星系核(AGN) 11: Ia型超新星(SNIa) 12: RR天琴座变星(RRL) 13: 超亮超新星(SLSN-I) 14: 5种“异常”类型,训练集中不存在:微透镜-双、中间光度光学瞬变(ILOT)、钙丰富瞬变(CaRT)、对不稳定超新星(PISN)、微透镜-串



