LUMA

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/bezirganyan/luma

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为LUMA，是一个基准数据集，它包含了来自50个类别的音频、图像和文本数据，用于从不确定和多模态数据中学习。LUMA数据集在CIFAR-10/100数据集的基础上扩展，加入了音频样本和利用大型语言模型生成的文本数据。该数据集允许可控地注入各种类型和程度的不确定性，并包含一个Python包，用于生成多种变体。数据规模方面，包含了25,200张图像作为训练和测试数据，以及3,859张图像作为OOD（Out-of-Distribution）数据。其研究任务定位于不确定和多模态数据学习。

The dataset named LUMA is a benchmark dataset that contains audio, image, and text data across 50 categories, designed for learning from uncertain and multimodal data. Built upon the CIFAR-10/100 datasets, LUMA has been extended with audio samples and text data generated by large language models (LLMs). It allows for the controllable injection of various types and degrees of uncertainty, and includes a Python package for generating multiple variants. In terms of scale, it contains 25,200 images for training and testing, as well as 3,859 images designated as out-of-distribution (OOD) data. Its research tasks are focused on learning from uncertain and multimodal data.

5,000+

优质数据集

54 个

任务类型

进入经典数据集