five

AstroMLCore/AstroM3Processed

收藏
Hugging Face2025-03-27 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/AstroMLCore/AstroM3Processed
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit size_categories: - 10K<n<100K tags: - astronomy - multimodal - classification arxiv: - arXiv:2411.08842 dataset_info: - config_name: full_0 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 699299280 num_examples: 17045 - name: validation num_bytes: 88554120 num_examples: 2155 - name: test num_bytes: 91992720 num_examples: 2240 download_size: 580478307 dataset_size: 879846120 - config_name: full_12 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 699768976 num_examples: 17054 - name: validation num_bytes: 88582160 num_examples: 2155 - name: test num_bytes: 91494984 num_examples: 2231 download_size: 580486890 dataset_size: 879846120 - config_name: full_123 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 699487664 num_examples: 17051 - name: validation num_bytes: 88353016 num_examples: 2149 - name: test num_bytes: 92005440 num_examples: 2240 download_size: 580495878 dataset_size: 879846120 - config_name: full_42 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 700165272 num_examples: 17063 - name: validation num_bytes: 88444168 num_examples: 2152 - name: test num_bytes: 91236680 num_examples: 2225 download_size: 580045234 dataset_size: 879846120 - config_name: full_66 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 700385576 num_examples: 17049 - name: validation num_bytes: 88184608 num_examples: 2157 - name: test num_bytes: 91275936 num_examples: 2234 download_size: 580502197 dataset_size: 879846120 - config_name: sub10_0 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 68144480 num_examples: 1660 - name: validation num_bytes: 8625200 num_examples: 210 - name: test num_bytes: 9001040 num_examples: 220 download_size: 57935691 dataset_size: 85770720 - config_name: sub10_12 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 68017160 num_examples: 1660 - name: validation num_bytes: 8615320 num_examples: 210 - name: test num_bytes: 8976816 num_examples: 219 download_size: 57813888 dataset_size: 85609296 - config_name: sub10_123 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 68063480 num_examples: 1660 - name: validation num_bytes: 8310440 num_examples: 200 - name: test num_bytes: 9046200 num_examples: 220 download_size: 57670030 dataset_size: 85420120 - config_name: sub10_42 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 68355600 num_examples: 1660 - name: validation num_bytes: 8746160 num_examples: 210 - name: test num_bytes: 9019080 num_examples: 220 download_size: 58013457 dataset_size: 86120840 - config_name: sub10_66 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 68336800 num_examples: 1670 - name: validation num_bytes: 8228160 num_examples: 200 - name: test num_bytes: 9013280 num_examples: 220 download_size: 57863989 dataset_size: 85578240 - config_name: sub25_0 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 174615960 num_examples: 4255 - name: validation num_bytes: 21970688 num_examples: 537 - name: test num_bytes: 22907360 num_examples: 555 download_size: 145911023 dataset_size: 219494008 - config_name: sub25_12 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 175139712 num_examples: 4258 - name: validation num_bytes: 22099648 num_examples: 537 - name: test num_bytes: 22444528 num_examples: 552 download_size: 145908071 dataset_size: 219683888 - config_name: sub25_123 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 175035904 num_examples: 4256 - name: validation num_bytes: 21742272 num_examples: 533 - name: test num_bytes: 22941032 num_examples: 558 download_size: 145940204 dataset_size: 219719208 - config_name: sub25_42 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 175335600 num_examples: 4260 - name: validation num_bytes: 21928408 num_examples: 532 - name: test num_bytes: 22793640 num_examples: 555 download_size: 145967962 dataset_size: 220057648 - config_name: sub25_66 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 175124832 num_examples: 4258 - name: validation num_bytes: 21796632 num_examples: 533 - name: test num_bytes: 22778824 num_examples: 556 download_size: 145942684 dataset_size: 219700288 - config_name: sub50_0 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 349306248 num_examples: 8517 - name: validation num_bytes: 44231680 num_examples: 1075 - name: test num_bytes: 45912624 num_examples: 1116 download_size: 290437676 dataset_size: 439450552 - config_name: sub50_12 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 350458024 num_examples: 8526 - name: validation num_bytes: 44336016 num_examples: 1074 - name: test num_bytes: 45652856 num_examples: 1114 download_size: 290857421 dataset_size: 440446896 - config_name: sub50_123 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 349542320 num_examples: 8525 - name: validation num_bytes: 44195632 num_examples: 1073 - name: test num_bytes: 45928584 num_examples: 1116 download_size: 290597740 dataset_size: 439666536 - config_name: sub50_42 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 349887664 num_examples: 8526 - name: validation num_bytes: 44171424 num_examples: 1071 - name: test num_bytes: 45487184 num_examples: 1111 download_size: 290269930 dataset_size: 439546272 - config_name: sub50_66 features: - name: photometry dtype: array2_d: shape: - null - 9 dtype: float32 - name: spectra dtype: array2_d: shape: - 3 - 2575 dtype: float32 - name: metadata sequence: float32 length: 34 - name: label dtype: class_label: names: '0': DSCT '1': EA '2': EB '3': EW '4': HADS '5': M '6': ROT '7': RRAB '8': RRC '9': SR splits: - name: train num_bytes: 350376040 num_examples: 8520 - name: validation num_bytes: 43972672 num_examples: 1073 - name: test num_bytes: 45551240 num_examples: 1115 download_size: 290555385 dataset_size: 439899952 configs: - config_name: full_0 data_files: - split: train path: full_0/train-* - split: validation path: full_0/validation-* - split: test path: full_0/test-* - config_name: full_12 data_files: - split: train path: full_12/train-* - split: validation path: full_12/validation-* - split: test path: full_12/test-* - config_name: full_123 data_files: - split: train path: full_123/train-* - split: validation path: full_123/validation-* - split: test path: full_123/test-* - config_name: full_42 data_files: - split: train path: full_42/train-* - split: validation path: full_42/validation-* - split: test path: full_42/test-* - config_name: full_66 data_files: - split: train path: full_66/train-* - split: validation path: full_66/validation-* - split: test path: full_66/test-* - config_name: sub10_0 data_files: - split: train path: sub10_0/train-* - split: validation path: sub10_0/validation-* - split: test path: sub10_0/test-* - config_name: sub10_12 data_files: - split: train path: sub10_12/train-* - split: validation path: sub10_12/validation-* - split: test path: sub10_12/test-* - config_name: sub10_123 data_files: - split: train path: sub10_123/train-* - split: validation path: sub10_123/validation-* - split: test path: sub10_123/test-* - config_name: sub10_42 data_files: - split: train path: sub10_42/train-* - split: validation path: sub10_42/validation-* - split: test path: sub10_42/test-* - config_name: sub10_66 data_files: - split: train path: sub10_66/train-* - split: validation path: sub10_66/validation-* - split: test path: sub10_66/test-* - config_name: sub25_0 data_files: - split: train path: sub25_0/train-* - split: validation path: sub25_0/validation-* - split: test path: sub25_0/test-* - config_name: sub25_12 data_files: - split: train path: sub25_12/train-* - split: validation path: sub25_12/validation-* - split: test path: sub25_12/test-* - config_name: sub25_123 data_files: - split: train path: sub25_123/train-* - split: validation path: sub25_123/validation-* - split: test path: sub25_123/test-* - config_name: sub25_42 data_files: - split: train path: sub25_42/train-* - split: validation path: sub25_42/validation-* - split: test path: sub25_42/test-* - config_name: sub25_66 data_files: - split: train path: sub25_66/train-* - split: validation path: sub25_66/validation-* - split: test path: sub25_66/test-* - config_name: sub50_0 data_files: - split: train path: sub50_0/train-* - split: validation path: sub50_0/validation-* - split: test path: sub50_0/test-* - config_name: sub50_12 data_files: - split: train path: sub50_12/train-* - split: validation path: sub50_12/validation-* - split: test path: sub50_12/test-* - config_name: sub50_123 data_files: - split: train path: sub50_123/train-* - split: validation path: sub50_123/validation-* - split: test path: sub50_123/test-* - config_name: sub50_42 data_files: - split: train path: sub50_42/train-* - split: validation path: sub50_42/validation-* - split: test path: sub50_42/test-* - config_name: sub50_66 data_files: - split: train path: sub50_66/train-* - split: validation path: sub50_66/validation-* - split: test path: sub50_66/test-* --- # AstroM3Processed ## Description AstroM3Processed is a time-series astronomy dataset containing photometry, spectra, and metadata features for variable stars. The dataset was constructed by cross-matching publicly available astronomical datasets, primarily from the ASAS-SN (Shappee et al. 2014) variable star catalog (Jayasinghe et al. 2019) and LAMOST spectroscopic survey (Cui et al. 2012), along with data from WISE (Wright et al. 2010), GALEX (Morrissey et al. 2007), 2MASS (Skrutskie et al. 2006) and Gaia EDR3 (Gaia Collaboration et al. 2021). The dataset includes multiple subsets (`full`, `sub10`, `sub25`, `sub50`) and supports different random seeds (`42`, `66`, `0`, `12`, `123`). Each sample consists of: - **Photometry**: Light curve data of shape `(N, 9)` (time, flux, flux\_error, amplitude, period, lksl_statistic, rfr_score, mad, delta_t). - **Spectra**: Spectra observations of shape `(3, 2575)` (wavelength, flux, flux\_error). - **Metadata**: List of metadata values of shape `(34,)` - **Label**: The class name as int. ## Corresponding paper and code - Paper: [AstroM<sup>3</sup>: A self-supervised multimodal model for astronomy](https://arxiv.org/abs/2411.08842) - Code Repository: [GitHub: AstroM<sup>3</sup>](https://github.com/MeriDK/AstroM3/) - Original Data: [AstroMLCore/AstroM3Dataset](https://huggingface.co/datasets/AstroMLCore/AstroM3Dataset/) **Note:** The processed dataset `AstroM3Processed` is created from the original dataset `AstroM3Dataset` by using [preprocess.py](https://huggingface.co/datasets/AstroMLCore/AstroM3Dataset/blob/main/preprocess.py) ## Subsets and Seeds AstroM3Dataset is available in different subset sizes: - `full`: Entire dataset - `sub50`: 50% subset - `sub25`: 25% subset - `sub10`: 10% subset Each subset is sampled from the respective train, validation, and test splits of the full dataset. For reproducibility, each subset is provided with different random seeds: - `42`, `66`, `0`, `12`, `123` ## Usage To load the dataset using the Hugging Face `datasets` library, specify the name in the format "{subset}_{seed}". For example: ```python from datasets import load_dataset # Load the full dataset with seed 42 dataset = load_dataset("AstroMLCore/AstroM3Processed", name="full_42") # Load the 25% subset sampled using seed 123 dataset = load_dataset("AstroMLCore/AstroM3Processed", name="sub25_123") ``` --- ## Citation 🤗 If you find this dataset usefull, please cite our paper 🤗 ```bibtex @article{rizhko2024astrom, title={AstroM $\^{} 3$: A self-supervised multimodal model for astronomy}, author={Rizhko, Mariia and Bloom, Joshua S}, journal={arXiv preprint arXiv:2411.08842}, year={2024} } ``` ## References 1. Shappee, B. J., Prieto, J. L., Grupe, D., et al. 2014, ApJ, 788, 48, doi: 10.1088/0004-637X/788/1/48 2. Jayasinghe, T., Stanek, K. Z., Kochanek, C. S., et al. 2019, MNRAS, 486, 1907, doi: 10.1093/mnras/stz844 3. Cui, X.-Q., Zhao, Y.-H., Chu, Y.-Q., et al. 2012, Research in Astronomy and Astrophysics, 12, 1197, doi: 10.1088/1674-4527/12/9/003 4. Wright, E. L., Eisenhardt, P. R. M., Mainzer, A. K., et al. 2010, AJ, 140, 1868, doi: 10.1088/0004-6256/140/6/1868 5. Morrissey, P., Conrow, T., Barlow, T. A., et al. 2007, ApJS, 173, 682, doi: 10.1086/520512 6. Skrutskie, M. F., Cutri, R. M., Stiening, R., et al. 2006, AJ, 131, 1163, doi: 10.1086/498708 7. Gaia Collaboration, Brown, A. G. A., et al. 2021, AAP, 649, A1, doi: 10.1051/0004-6361/202039657
提供机构:
AstroMLCore
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作