SUN Database 图像场景分类数据集，包含899个类别和130519 张图像

Name: SUN Database 图像场景分类数据集，包含899个类别和130519 张图像
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-1808.html

下载链接

链接失效反馈

官方服务：

资源简介：

Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases which do not capture the full variety of scene categories. Whereas standard databases for object categorization contain hundreds of different classes of objects, the largest available dataset of scene categories contains only 15 classes. In this paper we propose the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images. We use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. We measure human scene classification performance on the SUN database and compare this with computational methods. We use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance. The results are shown in the figure on the right. We visualize the results using the combined kernel from all features for the first training and testing partition in the following webpage. For each of the 397 categories, we show the class name, the ROC curve, 5 sample traning images, 5 sample correct predictions, 5 most confident false positives (with true label), and 5 least confident false negatives (with wrong predicted label). The database contains 397 categories SUN dataset used in the benchmark of the paper. The number of images varies across categories, but there are at least 100 images per category, and 108,754 images in total. Images are in jpg, png, or gif format. The images provided here are for research purposes only. For the results in the paper we use a subset of the dataset that has 50 training images and 50 testing images per class, averaging over the 10 partitions in the following. To plot the curve in Figure 4(b) of the paper, we use the first n=(1, 5, 10, 20) images outof the 50 training images per class for training, and use all the same 50 testing images for testing no matter what size the training set is. (If you are using Microsoft Windows, you may need to replace / by \ in the following files.) We have manually built an overcomplete three-level hierarchy for all 908 scene categories. The scene categories are arranged in a 3-level tree: with 908 leaf nodes (SUN categories) connected to 15 parent nodes at the second level (basic-level categories) that are in turn connected to 3 nodes at the first level (superordinate categories) with the root node at the top. The hierarchy is not a tree, but a Directed Acyclic Graph. Many categories such as "hayfield" are duplicated in the hierarchy because there might be confusion over whether such a category belongs in the natural or man-made sub-hierarchies. The feature matrices are avialble at THIS link. DrawMe is a light-weight Javascript library to enable client-end line drawing on a picture in a web browser. It is targeted to provide a basis for self-define labeling tasks for computer vision researchers. It is different from LabelMe, which provides full support but fixed labeling interface. DrawMe is a Javascript library only and the users are required to write their own code to make use of this library for their specific need of labeling. DrawMe does not provide any server or server-end code for labeling, but gives the user greater flexibility for their specific need. It also comes with a simple example with Amazon Mechanical Turk interface that serializes Javascript DOM object into text for HTML form submission. The user can easily build their own labeling interface based on this MTurk example to make use for the Amazon Mechanical Turk for labeling, either using paid workers or the researchers themselves with MTurk sandbox. This work is funded by NSF CAREER Awards 0546262 to A.O, 0747120 to A.T. and partly funded by BAE Systems under Subcontract No. 073692 (Prime Contract No. HR0011-08-C-0134 issued by DARPA), Foxconn and gifts from Google and Microsoft. K.A.E is funded by a NSF Graduate Research fellowship.

场景分类是计算机视觉领域的基础性问题。然而现有场景理解研究受限于当前所用数据库的覆盖范围有限，无法涵盖完整的场景类别体系。尽管用于目标分类的标准数据库包含数百种不同的目标类别，但当前可用的最大规模场景类别数据集仅包含15个类别。本文提出了大规模场景理解（Scene UNderstanding，SUN）数据库，该数据库涵盖899个类别与130519张图像。我们选取397个采样充分的类别，对大量当前最优的场景识别算法进行评估并确立新的性能基准。我们在SUN数据库上测量了人类场景分类性能，并将其与计算方法进行对比。我们再次选取397个采样充分的类别，对大量当前最优的场景识别算法进行评估并确立新的性能基准。实验结果如右侧图示所示。我们在如下网页中，针对首个训练与测试划分，采用所有特征的组合核函数对结果进行可视化。针对397个类别中的每一个，我们均展示了类别名称、受试者工作特征（Receiver Operating Characteristic，ROC）曲线、5张训练样本图像、5张正确预测样本、5张置信度最高的假阳性样本（附带真实标签）以及5张置信度最低的假阴性样本（附带错误预测标签）。本文基准实验所使用的SUN数据集包含397个类别。各分类的图像数量存在差异，但每个类别至少包含100张图像，总图像量为108754张。图像格式为jpg、png或gif。此处提供的图像仅用于科研用途。为得到论文中的实验结果，我们使用了数据集的一个子集：每个类别包含50张训练图像与50张测试图像，并在如下10个划分上取平均。为绘制论文中图4(b)的曲线，我们从每个类别的50张训练图像中选取前n=(1, 5, 10, 20)张用于训练，且无论训练集规模如何，均使用完全相同的50张测试图像进行测试。（若使用Microsoft Windows系统，你可能需要将如下文件路径中的正斜杠/替换为反斜杠。）我们手动为全部908个场景类别构建了一个过完备的三级层级结构。场景类别按三级树结构组织：908个叶节点（SUN类别）连接至第二层级的15个父节点（基础级类别），后者又连接至第一层级的3个节点（上位级类别），顶层为根节点。该层级结构并非树状结构，而是一个有向无环图（Directed Acyclic Graph，DAG）。诸如“干草场（"hayfield"）”这类类别在层级结构中存在重复，因为此类类别可能同时属于自然子层级与人工子层级，存在归属歧义。特征矩阵可通过如下链接获取。 DrawMe是一款轻量级JavaScript库，可实现在网页浏览器中对图像进行客户端线条绘制。其设计目标是为计算机视觉研究者的自定义标注任务提供基础工具。该库与LabelMe不同：LabelMe虽提供全功能支持，但标注界面固定。DrawMe仅为JavaScript库，用户需自行编写代码以根据自身标注需求使用该库。DrawMe未提供任何标注所需的服务器或服务器端代码，但为用户针对自身特定需求提供了更高的灵活性。该库附带一个适配亚马逊众包平台（Amazon Mechanical Turk，简称MTurk）的简单示例，可将JavaScript文档对象模型（Document Object Model，DOM）对象序列化为文本以用于HTML表单提交。用户可基于该MTurk示例快速构建自定义标注界面，利用Amazon Mechanical Turk开展标注工作——既可以使用付费标注人员，也可以通过MTurk沙箱由研究者自行完成标注。本研究受美国国家科学基金会（National Science Foundation，NSF）职业发展奖资助：编号0546262资助A.O，编号0747120资助A.T.；同时受BAE系统公司分包合同No.073692（主合同No.HR0011-08-C-0134由美国国防高级研究计划局（Defense Advanced Research Projects Agency，DARPA）颁发）、富士康以及Google与Microsoft的捐赠部分资助。K.A.E受NSF研究生研究奖学金资助。

提供机构：

帕依提提

搜集汇总

数据集介绍

背景与挑战

背景概述

SUN Database 是一个大规模的图像场景分类数据集，包含899个类别和130,519张图像，旨在解决场景理解研究中类别有限的问题。该数据集提供了广泛的场景多样性，常用于评估计算机视觉算法的性能，特别在场景识别任务中建立了新的性能基准。

以上内容由遇见数据集搜集并总结生成