NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection

收藏

Mendeley Data2024-05-10 更新2024-06-27 收录

下载链接：

https://zenodo.org/records/7931113

下载链接

链接失效反馈

官方服务：

资源简介：

NeSy4VRD NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation. Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives. The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub. NeSy4VRD on Zenodo: the NeSy4VRD dataset package This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World. The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>. Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing. The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning. Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself. All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub. NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of: comprehensive open source Python-based infrastructure supporting the extensibility of the NeSy4VRD visual relationship annotations (and, thereby, the extensibility of the NeSy4VRD ontology, VRD-World, as well) open source Python sample code showing how one can work with the NeSy4VRD visual relationship annotations in conjunction with the NeSy4VRD ontology, VRD-World, and RDF knowledge graphs. The NeSy4VRD infrastructure supporting extensibility consists of: open source Python code for conducting deep and comprehensive analyses of the NeSy4VRD dataset (the VRD images and their associated NeSy4VRD visual relationship annotations) an open source, custom-designed NeSy4VRD protocol for specifying visual relationship annotation customisation instructions declaratively, in text files an open source, custom-designed NeSy4VRD workflow, implemented using Python scripts and modules, for applying small or large volumes of customisations or extensions to the NeSy4VRD visual relationship annotations in a configurable, managed, automated and repeatable process. The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned. The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained. To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following: use the analysis functionality of the NeSy4VRD extensibility infrastructure to find images containing water (by, say, searching for images whose visual relationships refer to object classes such as 'boat', 'surfboard', 'sand', 'umbrella', etc.); use free image analysis software (such as GIMP, at gimp.org) to get bounding boxes for instances of water in these images; use the NeSy4VRD protocol to specify new visual relationships for these images that refer to the new 'water' objects (e.g. <'boat', 'on', 'water'>); use the NeSy4VRD workflow to introduce the new object class 'water' and to apply the specified new visual relationships to the sets of annotations for the affected images; introduce class Water to the class hierarchy of the VRD-World ontology (using, say, the free Protege ontology editor); continue experimenting, now with the added benefit of the additional stuff object class 'water'; contribute the enriched set of NeSy4VRD visual relationship annotations, and the enriched companion VRD-World ontology, to research communities. Information pertaining to the VRD dataset Information about the original VRD dataset is available here. Public availability of the VRD images (via information accessible from that location) ceased sometime in the latter part of 2021. We thank Dr. Ranjay Krishna, one of the principals associated with the VRD dataset, for granting us permission to re-establish the public availability of the VRD images as part of NeSy4VRD. The original VRD visual relationship annotations are still publicly available from that location. But our deep analysis of those annotations, driven by our desire to design a robust companion ontology, revealed them to be highly problematic in many ways that made credible ontology modelling infeasible. They were also found to be replete with all manner of errors. The NeSy4VRD visual relationship annotations are far superior and we recommend them over the original VRD annotations to anyone contemplating conducting research using the VRD images. The NeSy4VRD annotations also have the added benefit of the rich, well-aligned companion NeSy4VRD ontology, VRD-World, for those whose research requires such a companion ontology. Researchers wishing to use the original VRD dataset may still do so. They can access the VRD images here, from within the NeSy4VRD dataset on Zenodo, and access the VRD visual relationship annotations from the location in the link. A note of caution: the NeSy4VRD ontology, VRD-World, is not compatible with the original VRD visual relationship annotations and cannot be used in conjunction with them. The VRD-World ontology has been engineered in relation to the highly customised and quality-improved NeSy4VRD visual relationship annotations. The customisations that were applied include ones that introduced many new object classes, merged some of the existing object classes, introduced one new predicate, and changed several predicate names. However, researchers can, if they wish, use the NeSy4VRD extensibility infrastructure (described above) to undertake their own customisation and quality-improvement exercise with respect to the original VRD visual relationship annotations. This is precisely how the NeSy4VRD visual relationship annotations were created in the first place. The primary intended use case of NeSy4VRD's extensibility infrastructure, however, is for researchers to use the NeSy4VRD visual relationship annotations as their starting point, and to take these annotations forward with onward customisations and extensions, as illustrated in the example use case given above.

NeSy4VRD是一款多维度、多用途的科研资源，旨在推动神经符号AI（Neurosymbolic AI, NeSy）研究，尤其是以语义网技术（如OWL本体（OWL ontology）、基于OWL的知识图谱（OWL-based knowledge graph）及基于OWL的推理）作为符号组件的NeSy研究。该NeSy4VRD科研资源隶属于人工智能领域的计算机视觉方向，具体涵盖视觉关系检测（Visual Relationship Detection, VRD）与场景图生成（scene graph generation）两大应用任务。尽管NeSy4VRD的核心研发动机是推动基于语义网技术（如OWL本体及基于OWL的知识图谱）的计算机视觉NeSy研究，但人工智能研究者也可灵活使用NeSy4VRD实现两类研究目标：1）开展不依赖语义网技术作为符号组件的计算机视觉NeSy研究；2）开展不涉及NeSy范式的计算机视觉研究（即仅专注于深度学习、不使用任何类型符号组件的纯深度学习研究）。这正是我们称NeSy4VRD为多用途资源的原因：它可被具有不同研究兴趣与目标的各类计算机视觉AI研究者轻松适配使用。 NeSy4VRD整体资源分布于两个平台：Zenodo与GitHub。 ### Zenodo上的NeSy4VRD：NeSy4VRD数据集包本Zenodo条目托管了NeSy4VRD数据集包，其中包含NeSy4VRD数据集及其配套的NeSy4VRD本体——一款名为VRD-World的OWL本体。NeSy4VRD数据集由带视觉关系标注的图像数据集构成，其图像与此前作为VRD数据集公开的图像完全一致。NeSy4VRD的视觉关系标注是对原始VRD视觉关系标注的高度定制化与质量优化版本。 NeSy4VRD数据集专为涉及图像目标检测与目标有序对间关系预测的计算机视觉研究设计。NeSy4VRD数据集中的图像视觉关系采用<"subject", "predicate", "object">的形式，其中"subject"（主体）与"object"（客体）为图像中的两个目标，"predicate"（谓词）用于描述二者间的关联。主体与客体均通过边界框与目标类别进行定义。例如典型的标注视觉关系包括<"person", "ride", "horse">（<"人", "骑行", "马">）、<"hat", "on", "teddy bear">（<"帽子", "在...上", "泰迪熊">）与<"cat", "under", "pillow">（<"猫", "在...下方", "枕头">）。视觉关系检测本身作为一项独立的计算机视觉应用任务，同时也是场景图生成这一更广泛应用任务的核心基础能力。而场景图生成通常作为各类进阶下游视觉理解与推理应用任务的前置步骤，例如图像字幕生成、视觉问答、图像检索、图像生成及多媒体事件处理。 NeSy4VRD本体VRD-World是专为NeSy4VRD数据集打造的丰富且对齐度极高的配套OWL本体，直接描述了NeSy4VRD数据集的领域范畴，这一点可从NeSy4VRD的视觉关系标注中得到印证。具体而言，NeSy4VRD视觉关系标注中出现的所有目标类别，均在VRD-World的OWL类层级中存在对应的类；而所有出现在标注中的谓词，也均在VRD-World的OWL对象属性层级中存在对应的属性。VRD-World类层级的丰富结构，加之其对象属性的丰富特性与关联关系，共同赋予了VRD-World本体丰富的推理语义，为以OWL本体及基于OWL的知识图谱作为符号组件的NeSy研究提供了充分的OWL推理应用空间。此外，NeSy研究者还可探索结合Datalog规则与推理，进一步拓展VRD-World本体提供的OWL推理能力。当然，使用NeSy4VRD本体VRD-World与NeSy4VRD数据集完全可选。不关注NeSy范式的计算机视觉AI研究者，或不关注OWL本体及基于OWL的知识图谱的NeSy研究者，均可忽略该本体，单独使用NeSy4VRD数据集。所有计算机视觉AI研究用户群体，也可根据需求使用GitHub上提供的NeSy4VRD科研资源的其他组件。 ### GitHub上的NeSy4VRD：支持扩展性的开源基础设施与示例代码 NeSy4VRD科研资源包含Zenodo上数据集包的配套附加组件，这些配套组件托管于GitHub上的NeSy4VRD仓库。配套组件具体包括： 1. 全面的开源Python基础设施，支持NeSy4VRD视觉关系标注的扩展（同时也可实现NeSy4VRD本体VRD-World的扩展）； 2. 开源Python示例代码，展示如何结合NeSy4VRD视觉关系标注、VRD-World本体与RDF知识图谱（RDF knowledge graph）进行开发。支撑扩展性的NeSy4VRD基础设施包含： - 用于对NeSy4VRD数据集（VRD图像及其配套的NeSy4VRD视觉关系标注）进行深度全面分析的开源Python代码； - 一款定制设计的开源NeSy4VRD协议，支持以文本文件形式声明式指定视觉关系标注的定制指令； - 一款基于Python脚本与模块实现的定制设计NeSy4VRD工作流，可在可配置、可管理、自动化且可重复的流程中，对NeSy4VRD视觉关系标注进行批量或小规模的定制与扩展。提供该扩展性支撑基础设施的初衷，是为了方便研究者通过进一步丰富标注，或定制适配新的数据条件以贴合自身特定研究需求与兴趣，从而将NeSy4VRD数据集应用于新的研究方向。这种使用NeSy4VRD扩展性基础设施的方式，同样适用于前文提及的各类潜在NeSy4VRD用户群体。不过，该扩展性基础设施对于希望结合使用VRD-World本体与NeSy4VRD数据集的NeSy研究者而言尤为适用。这类研究者可根据需求自行定制VRD-World本体，而无需对NeSy4VRD视觉关系标注进行任何修改或扩展，但此时其定制自由度会受到必须与NeSy4VRD视觉关系标注及其所涵盖的目标类别与谓词集合保持对齐的限制。若NeSy研究者希望完全自由地定制VRD-World本体，则通常需要先对NeSy4VRD视觉关系标注进行定制，以确保二者的对齐性。为阐释该设计思路并展示NeSy4VRD扩展性基础设施的使用方式，我们不妨考虑一个简单示例。在计算机视觉领域，通常会将目标分为“具象物体（Thing）”（具有明确形状）与“物质类物体（Stuff）”（无定形形态）两类。假设某位研究者希望增加可使用的物质类目标类别，例如“水（Water）”。许多VRD图像中都包含水体，但当前标注中并未包含“水”这一目标类别，因此其不会出现在任何视觉关系标注中。此时仅向VRD-World本体的类层级中添加Water类将毫无意义，因为目标检测器无法检测到水体实例，该类不会获得任何实例化数据。但该假想研究者可通过以下步骤实现目标： 1. 使用NeSy4VRD扩展性基础设施的分析功能，查找包含水体的图像（例如通过搜索视觉关系中涉及“船”“冲浪板”“沙子”“雨伞”等目标类别的图像）； 2. 使用免费图像分析软件（如GIMP，官网为gimp.org）为这些图像中的水体实例标注边界框； 3. 使用NeSy4VRD协议，为这些图像指定涉及新增“水”目标的新视觉关系（例如<"boat", "on", "water">，即<"船", "在...上", "水">）； 4. 使用NeSy4VRD工作流，新增“水”目标类别，并将指定的新视觉关系应用到受影响图像的标注集合中； 5. 使用免费的Protege本体编辑器等工具，向VRD-World本体的类层级中添加Water类； 6. 借助新增的“水”物质类目标类别继续开展研究； 7. 将优化后的NeSy4VRD视觉关系标注集合与配套的优化后VRD-World本体贡献给科研社区。 ### 关于VRD数据集的相关信息原始VRD数据集的相关信息可在此处获取。VRD图像的公开访问已于2021年下半年终止。我们感谢VRD数据集的核心参与者之一Ranjay Krishna博士，感谢其授权我们将VRD图像作为NeSy4VRD的一部分重新恢复公开访问。原始VRD视觉关系标注仍可从原链接处公开获取。但我们在为设计鲁棒的配套本体而对这些标注进行深度分析后发现，原始标注存在诸多严重问题，使得可信的本体建模难以实现，同时还充斥着各类错误。NeSy4VRD的视觉关系标注在质量上远优于原始标注，我们推荐所有计划使用VRD图像开展研究的研究者使用该标注而非原始VRD标注。此外，对于研究中需要配套本体的研究者而言，NeSy4VRD标注还附带了丰富且高度对齐的NeSy4VRD本体VRD-World。想要使用原始VRD数据集的研究者仍可照常使用：他们可从Zenodo上的NeSy4VRD数据集中获取VRD图像，并从对应链接处获取原始VRD视觉关系标注。需要注意的是：NeSy4VRD本体VRD-World与原始VRD视觉关系标注不兼容，无法结合使用。VRD-World本体是针对高度定制化且经过质量优化的NeSy4VRD视觉关系标注设计的，其中的定制操作包括新增大量目标类别、合并部分现有目标类别、新增一个谓词以及修改多个谓词名称。不过，研究者可根据需求使用前文所述的NeSy4VRD扩展性基础设施，对原始VRD视觉关系标注进行自定义定制与质量优化——这正是NeSy4VRD视觉关系标注最初的创建方式。但NeSy4VRD扩展性基础设施的主要目标用例，是以NeSy4VRD视觉关系标注作为起点，通过后续的定制与扩展对其进行优化，正如前文示例展示的那样。

创建时间：

2023-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集

© 2023-2025 上海数据发展科技有限责任公司版权所有

沪ICP备17003045号-15 沪公网安备31010402336585号

二维码

社区交流群

面向社区/商业的数据集话题

二维码

科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作