Mapping User Attention: Filtering and Visualizing Relevant UI Components in Screenshots based on Gaze Fixations
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8009444
下载链接
链接失效反馈官方服务:
资源简介:
These data correspond to the set of problems used for the evaluation of the proposal What Are You Gazing At? An Approach to Use Eye-tracking for Robotic Process Automation.
Each problem consists of a set of 10 screenshots with the same look and feel but different data values for those values that can be entered/modify by the user. Each problem has its associated gaze fixation data. In each of the problems there is a key UI element that primarily attracts the attention of the user.
The evaluation is based on a set of images which resemble realistic screenshots of activities in the administrative domain. More precisely, 5 different set of screenshots (S) are generated, each of them with a different level of complexity. Complexity is measured in terms of the number of UI elements per screenshot. The sets are:
S1 Mockup-based email view. Represents the activity of viewing an email to check if it contains an attachment. In this case, the key UI element that receives the attention is the attachment inside the email.
S2 Mockup-based CRM user details. Represents a user's detail viewing activity within a Client Relationship Management (CRM) platform. The key UI element is the checkbox that indicates if the user has all his invoices paid.
S3 Real screenshot email view. Analogous to S1 but with real screenshots. It represents the activity of viewing an e-mail to check if it contains an attachment. In this case, the key UI element to which attention is paid is the attachment contained in the e-mail.
S4 Real screenshot CRM user details. Analogous to S2 but with real screenshots. It represents a user's detail viewing activity within a CRM platform. The key UI element is the checkbox indicating whether the user has all their invoices paid.
S5 Real screenshot CRM user details. Represents the split-screen display of two applications. On the left side a pdf viewer, showing a covid vaccination certificate. And on the right side a human resources management system (basic recreation of real system for privacy reasons). In this one the detail of the employee to whom the certificate of the left side corresponds is visualized. These screenshots, having two applications, have two key UI elements. In the pdf viewer it is the name of the certificate holder and in the human resources management system it is the name of the employee whose detail view is being displayed. The activity being carried out is the verification that the covid certificate received corresponds to that of an employee.
Two types of filters based on the gaze fixation data are applied to these sets of screenshots: Pre-filtering and Post-filtering, corresponding to applying the filtering before and after detecting UI components in the screenshots, respectively. The structure of the data packages is divided in two folders input and output. The input folder is organized as follows:
input/
screenshots/: corresponds to the screenshots. The sets of screenshots are easily identifiable, they are named following the pattern: SX_screenshot_DDDD.jpeg. Where X indicates to which of the set of screenshots described in the previous list it belongs, and DDDD represents a unique identifier for each screenshot. Each group consists of 10 screenshots, being 50 in total.
fixation.json: It is a JSON file that contains a key associated with each of the screenshots. For each screenshot, it contains a "fixation_points" key where information about the fixations that have occurred on the screenshot is stored. Here's an example:
"S5_screenshot_0050.jpeg": {
"fixation_points": {
"334.25#497.166666666667": {
"#events": 6,
"start_index": 33224,
"ms_start": 553962.1467,
"ms_end": 554061.9899,
"duration": 99.8432000001194,
"imotions_dispersion": 0.300325967868111,
"last_index": 33229,
"dispersion": 14.044275227531914
},
"1258.80769230769#507.576923076923": {
"#events": 13,
"start_index": 33234,
"ms_start": 554128.5427,
"ms_end": 554345.3595,
...
The output folder is organized in three subfolders, the first one containing the information of the non-filtered screenshots (i.e. without having applied to them any filtering or processing), and the next two with the information resulting from pre-filtering and post-filtering.
output/
non-filter/
borders/: screenshots with highlighted borders of all UI components detected in it.
components_json/: a collection of JSON files with the same name as the screenshot, containing the "img_shape" key with a list of the screen resolution and the number of layers the image has: [1080, 1920, 3], and the "compos" key with a list of all UI components representing the Screen Object Model.
pre-filter/ and post-filter/
borders/: screenshots with the borders of the relevant UI components. In the case of prefiltering, the detection of components is only performed on the parts of the screenshot that have received attention. In postfiltering, the complete screenshot is shown, with only the borders of the relevant UI components highlighted.
components_json/: a collection of JSON files with the same name as the screenshot is included, containing the following keys:
"img_shape": A list representing the screen resolution and the number of layers in the image, e.g., [1080, 1920, 3].
"compos": A list of all UI components representing the Screen Object Model (SOM). During post-filtering, each UI component is augmented with an additional property called "relevant." If this property is set to true, it indicates that the respective UI component has received attention.
(pre)/(post)filter_attention_maps/: represent the attention maps. In the case of prefiltering, any surface of the screen that has not received attention will be shown in black. In the case of postfiltering, the areas of attention will be shown as red circles, and the UI components whose area intersects with the areas of attention by more than 25% will be shown in yellow.
In conclusion, the described data package consists of sets of screenshots, accompanied by prefiltering and postfiltering filters using gaze fixation data, enabling the identification of relevant UI components. The organized data packages include input and output folders, where the output folder offers processed screenshots, UI component information, and attention maps. This resource provides valuable insights into user attention and interaction with UI elements on different types of scenarios.
创建时间:
2024-09-09



