PosCUDA
收藏arXiv2024-01-04 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2401.02135v1
下载链接
链接失效反馈官方服务:
资源简介:
PosCUDA是一个针对音频数据设计的不可学习数据集,由加州大学圣地亚哥分校的研究人员创建。该数据集包含超过100,000个音频样本,每个样本约一秒钟。PosCUDA通过在音频的小块上应用基于位置的卷积,使用每个类别的私有密钥来确定补丁的位置,从而在保持原始音频质量的同时,使数据集对模型不可学习。该数据集主要用于解决未经授权的个人数据在模型训练中的滥用问题,特别是在音频分类和生成模型中。PosCUDA的创建过程涉及使用低通滤波器对特定位置的音频进行模糊处理,这种方法不仅快速而且对数据质量的影响最小,使其适用于实际应用。
PosCUDA is an unlearnable dataset designed for audio data, developed by researchers from the University of California, San Diego. This dataset comprises over 100,000 audio samples, each lasting approximately one second. PosCUDA applies position-based convolutions on small audio patches, with a private key per category used to determine the location of these patches, thus rendering the entire dataset unlearnable for machine learning models while retaining the original audio quality. Its primary purpose is to address the unauthorized misuse of personal data during model training, particularly in audio classification and generative models. The creation process of PosCUDA involves applying low-pass filters to blur audio at specific positions; this approach is not only computationally efficient but also imposes minimal degradation to data quality, making it suitable for real-world applications.
提供机构:
加州大学圣地亚哥分校
创建时间:
2024-01-04



