SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14583106

下载链接

链接失效反馈

官方服务：

资源简介：

Datasets used in the paper "SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks" Abstract: Modern deep learning models often do not generalize well in the presence of a "covariate shift"; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels given the data remains unchanged. In such cases, neural network (NN) generalization can be reduced to a problem of learning more robust, domain-invariant features that enable the correct alignment of the two datasets in the network's latent space. Domain adaptation (DA) methods include a broad range of techniques aimed at achieving this, which allows the model to perform well on multiple datasets. However, these methods have struggled with the need for extensive hyperparameter tuning, which then incurs significant computational costs. In this work, we introduce SIDDA, an out-of-the-box DA training algorithm built upon the Sinkhorn divergence, that can achieve effective domain alignment with minimal hyperparameter tuning and computational overhead. We demonstrate the efficacy of our method on multiple simulated and real datasets of varying complexity, including simple shapes, handwritten digits, and real astronomical observational data. These datasets include covariate shifts induced by noise and blurring, as well as more complex differences between real astronomical data observed by different telescopes. SIDDA is compatible with a variety of NN architectures, and it works particularly well in improving classification accuracy and model calibration when paired with equivariant neural networks (ENNs), which respect data symmetries by design. We find that SIDDA consistently improves the generalization capabilities of NNs, enhancing classification accuracy in unlabeled target data by up to 40%. Simultaneously, the inclusion of SIDDA during training can improve performance on the labeled source data, though with a more modest enhancement of approximately 1%. We also study the efficacy of DA on ENNs with respect to the varying group orders of the dihedral group D_N, and find that the model performance improves as the degree of equivariance increases. Finally, we find that SIDDA can also improve the model calibration on both source and target data. The largest improvements are obtained when the model is applied to the unlabeled target domain, reaching more than an order of magnitude improvement in both the expected calibration error and the Brier score. SIDDA's versatility across various NN models and datasets, combined with its automated approach to domain alignment, has the potential to significantly advance multi-dataset studies by enabling the development of highly generalizable models. Datasets: Dataset directories include train and test subdirectories, which include the source and target domain data within them. The simulated datasets of shapes and astronomical objects were generated using DeepBench, with code for noise and PSF blurring found on our Github. The MNIST-M dataset can be found publically, and the Galaxy Zoo Evo dataset can be accessed following the steps on HuggingFace. Data was split into an 80%/20% train/test split. Simulated shapes: train: source target (noise) test: source target (noise) Simulated astronomical objects: train: source target (noise) test: source target (noise MNIST-M: train: source target (noise) target (PSF) test: source target (noise) target (PSF) Galaxy Zoo Evo: train: source (GZ SDSS) target (GZ DESI) test: source (GZ SDSS) target (GZ DESI) Paper Data: Data for generating Figures 4 and 5 in the paper are included in isomap_plot_data.zip and js_distances_group_order.zip, respectively. The code for generating the figures can be found in the notebooks on our Github. Figures 2 and 3 are visualizations of the datasets included here.

创建时间：

2025-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集