five

AstroMLCore/AstroM3Dataset

收藏
Hugging Face2025-03-27 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/AstroMLCore/AstroM3Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit pretty_name: AstroM3Dataset size_categories: - 10K<n<100K tags: - astronomy - multimodal - classification arxiv: - arXiv:2411.08842 --- # AstroM3Dataset ## Description AstroM3Dataset is a time-series astronomy dataset containing photometry, spectra, and metadata features for variable stars. The dataset was constructed by cross-matching publicly available astronomical datasets, primarily from the ASAS-SN (Shappee et al. 2014) variable star catalog (Jayasinghe et al. 2019) and LAMOST spectroscopic survey (Cui et al. 2012), along with data from WISE (Wright et al. 2010), GALEX (Morrissey et al. 2007), 2MASS (Skrutskie et al. 2006) and Gaia EDR3 (Gaia Collaboration et al. 2021). The dataset includes multiple subsets (`full`, `sub10`, `sub25`, `sub50`) and supports different random seeds (`42`, `66`, `0`, `12`, `123`). Each sample consists of: - **Photometry**: Light curve data of shape `(N, 3)` (time, flux, flux\_error). - **Spectra**: Spectra observations of shape `(M, 3)` (wavelength, flux, flux\_error). - **Metadata**: - `meta_cols`: Dictionary of metadata feature names and values. - `photo_cols`: Dictionary of photometric feature names and values. - **Label**: The class name as a string. ## Corresponding paper and code - Paper: [AstroM<sup>3</sup>: A self-supervised multimodal model for astronomy](https://arxiv.org/abs/2411.08842) - Code Repository: [GitHub: AstroM<sup>3</sup>](https://github.com/MeriDK/AstroM3/) - Processed Data: [AstroMLCore/AstroM3Processed](https://huggingface.co/datasets/AstroMLCore/AstroM3Processed/) **Note:** The processed dataset `AstroM3Processed` is created from the original dataset `AstroM3Dataset` by using [preprocess.py](https://huggingface.co/datasets/AstroMLCore/AstroM3Dataset/blob/main/preprocess.py) --- ## Subsets and Seeds AstroM3Dataset is available in different subset sizes: - `full`: Entire dataset - `sub50`: 50% subset - `sub25`: 25% subset - `sub10`: 10% subset Each subset is sampled from the respective train, validation, and test splits of the full dataset. For reproducibility, each subset is provided with different random seeds: - `42`, `66`, `0`, `12`, `123` ## Data Organization The dataset is organized as follows: ``` AstroM3Dataset/ ├── photometry.zip # Contains all photometry light curves ├── utils/ │ ├── parallelzipfile.py # Zip file reader to open photometry.zip ├── spectra/ # Spectra files organized by class │ ├── EA/ │ │ ├── file1.dat │ │ ├── file2.dat │ │ ├── ... │ ├── EW/ │ ├── SR/ │ ├── ... ├── splits/ # Train/val/test splits for each subset and seed │ ├── full/ │ │ ├── 42/ │ │ │ ├── train.csv │ │ │ ├── val.csv │ │ │ ├── test.csv │ │ │ ├── info.json # Contains feature descriptions and preprocessing info │ │ ├── 66/ │ │ ├── 0/ │ │ ├── 12/ │ │ ├── 123/ │ ├── sub10/ │ ├── sub25/ │ ├── sub50/ │── AstroM3Dataset.py # Hugging Face dataset script ``` ## Usage To load the dataset using the Hugging Face `datasets` library: ```python from datasets import load_dataset # Load the default full dataset with seed 42 dataset = load_dataset("AstroMLCore/AstroM3Dataset", trust_remote_code=True) ``` The default configuration is **full_42** (entire dataset with seed 42). To load a specific subset and seed, use {subset}_{seed} as the name: ```python from datasets import load_dataset # Load the 25% subset sampled using seed 123 dataset = load_dataset("AstroMLCore/AstroM3Dataset", name="sub25_123", trust_remote_code=True) ``` --- ## Citation 🤗 If you find this dataset usefull, please cite our paper 🤗 ```bibtex @article{rizhko2024astrom, title={AstroM $\^{} 3$: A self-supervised multimodal model for astronomy}, author={Rizhko, Mariia and Bloom, Joshua S}, journal={arXiv preprint arXiv:2411.08842}, year={2024} } ``` ## References 1. Shappee, B. J., Prieto, J. L., Grupe, D., et al. 2014, ApJ, 788, 48, doi: 10.1088/0004-637X/788/1/48 2. Jayasinghe, T., Stanek, K. Z., Kochanek, C. S., et al. 2019, MNRAS, 486, 1907, doi: 10.1093/mnras/stz844 3. Cui, X.-Q., Zhao, Y.-H., Chu, Y.-Q., et al. 2012, Research in Astronomy and Astrophysics, 12, 1197, doi: 10.1088/1674-4527/12/9/003 4. Wright, E. L., Eisenhardt, P. R. M., Mainzer, A. K., et al. 2010, AJ, 140, 1868, doi: 10.1088/0004-6256/140/6/1868 5. Morrissey, P., Conrow, T., Barlow, T. A., et al. 2007, ApJS, 173, 682, doi: 10.1086/520512 6. Skrutskie, M. F., Cutri, R. M., Stiening, R., et al. 2006, AJ, 131, 1163, doi: 10.1086/498708 7. Gaia Collaboration, Brown, A. G. A., et al. 2021, AAP, 649, A1, doi: 10.1051/0004-6361/202039657
提供机构:
AstroMLCore
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作