Winograd Images Dataset
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/z6jb259pcd
下载链接
链接失效反馈官方服务:
资源简介:
Understanding the factors influencing eye movements while freely viewing (no instructions from the experiment) is challenging. On one side, there has been an extensive body of work that suggests low-level saliency (Itti et al., 1998; Harel et al., 2006) predicts where people look, while more recently, work done by Henderson et al. (2018) on meaning maps has shown that local cropped image regions judged to be meaningful are a much better predictor. In this study, we hypothesize that people try to understand scenes by default while freely viewing and directing their fixations to areas that contribute to understanding the scene.
In most natural scenes, the low-level saliency and locally meaningful regions are correlated with the objects important to understanding a scene. To dissociate these factors, we have developed the Winograd image pairs inspired by the Winograd Schema Challenge for sentences (Levesque et al., 2012). Each image pair visually looks very similar, but when asked to describe it, people describe it entirely differently. This allows us to study scene understanding while preserving the low-level visual aspects. This dataset gives access to the 18 pairs of images used in the study.
We also introduce a new quantitative approach to measure the contribution of an object to scene understanding by assessing the impact of deleting each object from the image on the scene description relative to the gold standard description (Scene Understanding Maps (SUM)). This dataset gives access to the images with each object deleted as well as the Scene Understanding Maps (SUM)
As part of the study, we conducted four eye movement conditions (free viewing, scene description, object search, and counting objects; between-subject design, N=50 per condition) and also compared the ability of our SUM model and other fixation prediction models (DeepGaze, GBVS, and Meaning maps) to predict the fixation frequency. We provide all the generated heat maps as part of this dataset.
The eye movement data and the code to access it are provided in the GitHub Repository (https://github.com/shravan1394/WinogradDataset)
All descriptions collected as part of this study are also provided in the GitHub repository
The dataset contains the following:
Winograd Image Pairs: 18 image pairs were used in the study. Each pair is split into two different folders (Set_1 and Set_2)
SourceData: Numpy files that plot the fixation distribution across object categories and cumulative fixation line plots.
HeatMaps: Measured (from eye movement data for 50 subjects per condition) and predicted fixation heatmaps (from models including our SUM model) for each experiment (free viewing, scene description, object search, counting objects). We also provided a summary Word document showing all the heat maps for each image in our dataset.
If using this data set or images, please reference this paper: "The Curious Mind: Eye Movements to Maximize Scene Understanding."
创建时间:
2025-06-18



