合成样本数据集

Name: 合成样本数据集
Creator: 韩国科学技术院
Published: 2024-04-12 09:53:33
License: 暂无描述

arXiv2024-04-12 更新2024-06-21 收录

下载链接：

https://github.com/db-Lee/selfsup dd

下载链接

链接失效反馈

官方服务：

资源简介：

合成样本数据集是由韩国科学技术院的研究团队开发，旨在通过蒸馏技术将大规模未标记数据集压缩成小规模的合成样本集合，以便于高效的自监督学习。该数据集包含1000个合成样本，这些样本经过精心设计，以确保模型在预训练和转移到目标数据集时能够保持高性能。创建过程涉及使用均方误差作为优化目标，避免了数据增强或输入掩码带来的随机性，从而提高了优化过程的稳定性。该数据集主要应用于转移学习、架构泛化和目标数据无知识蒸馏等领域，旨在解决模型预训练和转移过程中的效率和性能问题。

This synthetic sample dataset was developed by a research team from the Korea Advanced Institute of Science and Technology (KAIST). It aims to compress large-scale unlabeled datasets into a small-scale synthetic sample collection via distillation techniques, facilitating efficient self-supervised learning. The dataset contains 1000 synthetic samples, which are meticulously designed to ensure that models maintain high performance during pre-training and when transferred to target datasets. The creation process uses Mean Squared Error (MSE) as the optimization objective, eliminating the randomness introduced by data augmentation or input masking, thereby improving the stability of the optimization procedure. This dataset is primarily applied in transfer learning, architectural generalization, and target-data-free knowledge distillation, among other research fields, and is designed to address the efficiency and performance issues encountered during model pre-training and transfer.

提供机构：

韩国科学技术院

创建时间：

2023-10-10