mLab App Paradata
收藏DataCite Commons2025-11-13 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/mLab_App_Paradata/28211636
下载链接
链接失效反馈官方服务:
资源简介:
# mLab App Paradata Data<br><br>This data is part of the publications associated with the analysis of paradata and user engagement with the mLab App, a digital health application designed to support at-home HIV testing. The dataset includes detailed user interactions, testing results, and metadata collected during a multi-site clinical trial. The data is divided into several files, each with its own schema and purpose, as described below.<br><br>**Notes:**<br>- The data has been de-identified and cleaned.<br>- The datasets originated from a REDCap project. Due to the different needs that the custom reports used to generate the dataset served, there may be redunancies across the different files.<br>- The `user_id` assigned to each user is consistent across each of the files (i.e., test windows in test_windows.csv for user_id == 10 correspond to user_id == 10 in para.csv, info.csv, etc.).<br>- Paradata collection was designed to operate asynchronously, ensuring that no user interactions were disrupted during data collection. As mLab was a browser-based technology, when users use browser navigation rapidly, there can be things that appear out of order (as we noted in our manuscripts).<br>- Dataframe column datatypes have been converted to satisfy specific analyses. Please check datatypes and convert as needed for your particular needs.<br>- <i>*Due to the sensitive nature of the survey data, the CSV/parquet files for survey are not included in this data repository. They will be made available upon reasonable request.*</i><br>- For detailed descriptions of the study design and methodology, please refer to the associated publication.<br><br>## File Descriptions<br><br>### facts.csv / facts.parquet<br><br>This file records the educational facts shown to users.<br><br><i>_Column Name_</i>: <i>_Description_</i><br>`display_timestamp`: Unix timestamp when an educational fact was displayed to the user.<br>`session_id`: Unique identifier for the user’s session when the fact was shown.<br>`user_id`: Unique identifier for the user the fact was shown to.<br>`fact_category`: Category of the educational fact displayed to the user.<br>`fact_index`: Index number of the fact shown to the user.<br>`fact_text`: Text of the educational fact displayed.<br><br>### info.csv / info.parquet<br><br>This file contains user-specific metadata, and repeated data about each user (alerts and pinned facts).<br><br><i>_Column Name_</i>: <i>_Description_</i><br>`user_id`: Unique identifier for the user.<br>`redcap_repeat_instrument`: REDCap field indicating the repeat instrument used. For general information about the user (userlocation and numberoflogins), redcap_repeat_instrument is blank. For repeated data (alerts, pinned facts, scheduled tests), redcap_repeat_instrument will identify the instrument.<br>`redcap_repeat_instance`: Instance number of the repeat instrument (if applicable).<br>`user_location`: Location of the user (if available). (1: New York City cohort; 2: Chicago cohort)<br>`alert_date`: A unix timestamp of when an alert was sent to the user.<br>`number_of_logins`: Total number of logins by the user.<br>`alert_subject`: Subject or type of the alert sent.<br>`alert_read`: Indicates whether the alert was read by the user (1: True; 0: False).<br>`end_date`: Unix timestamp of the end date of scheduled tests.<br>`start_date`: Unix timestamp of the start date of scheduled tests.<br>`fact_category`: Category of the educational fact pinned by the user.<br>`fact_index`: Index number of the fact pinned by the user.<br>`fact_text`: Text of the educational fact pinned by the user.<br>`fact_link`: Link to additional information associated with the fact pinned by the user (if available).<br><br>### para.csv / para.parquet<br><br>This file includes paradata (detailed in-app user interactions) collected during the study.<br><br><i>_Column Name_</i>: <i>_Description_</i><br>`timestamp`: A timezone-naive timestamp of the user action or event.<br>`session_id`: Unique identifier for the user’s session.<br>`user_id`: Unique identifier for the user.<br>`user_action`: Specific user action (e.g., button press, page navigation). "[]clicked" indicates a pressable element (i.e., button, collapsible/expandable menu) is pressed.<br>`current_page`: Current page of the app being interacted with.<br>`browser`: Browser used to access the app.<br>`platform`: Platform used to access the app (e.g., Windows, iOS).<br>`platform_description`: Detailed description of the platform.<br>`platform_maker`: Manufacturer of the platform.<br>`device_name`: Name of the device used.<br>`device_maker`: Manufacturer of the device used.<br>`device_brand_name`: Brand name of the device used.<br>`device_type`: Type of device used (Mobile, Computer, etc.).<br>`user_location`: Location of the user (1: New York City cohort; 2: Chicago cohort).<br><br>### survey.csv / survey.parquet<br><br>This file contains survey responses collected from users.<br><i>*NOTE: Due to the sensitive nature of this data, CSV/parquet files are not included in this data repository. They will be made available upon reasonable request.*</i><br><br><i>_Column Name_</i>: <i>_Description_</i><br>`user_id`: Unique identifier for the user.<br>`timepoint`: Timepoint of the survey (baseline/0 months, 6 months, 12 months).<br>`race`: Race of the user.<br>`education`: Education level of the user.<br>`health_literacy`: Health literacy score of the user.<br>`health_efficacy`: Health efficacy score of the user.<br>`itues_mean`: Information Technology Usability Evaluation Scale (ITUES) mean score.<br>`age`: Age of the user.<br><br>### tests.csv / tests.parquet<br><br>This file contains data related to the HIV self-tests performed by users in the mLab App.<br><br><i>_Column Name_</i>: <i>_Description_</i><br>`user_id`: Unique identifier for the user that took the test.<br>`visual_analysis_date`: A unix timestamp of the visual analysis of the test by the user.<br>`visual_result`: Result of the visual analysis (positive, negative).<br>`mlab_analysis_date`: A unix timestamp of the analysis conducted by the mLab system.<br>`mlab_result`: Result from the mLab analysis (positive, negative).<br>`signal_ratio`: Ratio of the intensity of test signal to the control signal.<br>`control_signal`: mLab calculated intensity of the control signal.<br>`test_signal`: mLab calculated intensity of the test signal.<br>`browser`: Browser used to access the app (from the User Agent string).<br>`platform`: Platform used to access the app (e.g., Windows, iOS) (from the User Agent string).<br>`platform_description`: Detailed description of the platform (from the User Agent string).<br>`platform_maker`: Manufacturer of the platform (from the useragUser Agentent string).<br>`device_name`: Name of the device used (from the User Agent string).<br>`device_maker` Manufacturer of the device used (from the User Agent string).<br>`device_brand_name`: Brand name of the device used (from the User Agent string).<br>`device_type`: Type of device used (Mobile, Computer, etc.) (from the User Agent string).<br><br>### test_windows.csv / test_windows.parquet<br><br>This file contains information on testing windows assigned to users.<br><br><i>_Column Name_</i>: <i>_Description_</i><br>`user_id`: Unique identifier for the user.<br>`redcap_repeat_instance`: Instance of the repeat instrument.<br>`start_date`: Start date of the (hard) testing window.<br>`end_date`: End date of the (hard) testing window.<br><br><br>## Citation<br><br>If you use this dataset, please cite the associated mLab and mLab paradata publications.<br><br><br><br>
# mLab应用程序附属数据(Paradata)数据集
本数据集隶属于与mLab应用程序(一款旨在支持居家HIV检测的数字健康应用)的附属数据(Paradata)分析及用户使用行为研究相关的发表成果。数据集包含多中心临床试验期间采集的详尽用户交互记录、检测结果及元数据。本数据集按功能划分为多个文件,各文件拥有独立的结构与用途,详述如下。
**注意事项:**
- 数据集已完成去标识化处理与数据清洗。
- 本数据集源自REDCap项目。由于生成数据集时使用的自定义报告适配了不同的业务需求,不同文件间可能存在数据冗余。
- 所有文件中为用户分配的`user_id`保持一致,例如`user_id`为10的测试窗口记录在`test_windows.csv`中,与`para.csv`、`info.csv`等文件中的`user_id=10`记录相对应。
- 附属数据(Paradata)的采集采用异步设计,确保数据采集过程不会干扰用户的正常交互。由于mLab为基于浏览器的应用,当用户快速进行浏览器导航操作时,部分记录可能会出现时序错乱的情况(详见本研究相关手稿)。
- 数据帧的列数据类型已针对特定分析场景完成转换,请根据自身研究需求检查并调整数据类型。
- *由于调研数据涉及敏感信息,本数据仓库未包含调研相关的CSV/Parquet文件。如有合理需求,可另行申请获取。*
- 如需了解研究设计与方法学的详细内容,请参阅本数据集对应的发表论文。
## 文件说明
### facts.csv / facts.parquet
本文件记录向用户展示的健康教育信息详情。
**列名**:**说明**
`display_timestamp`: 向用户展示健康教育信息时的Unix时间戳。
`session_id`: 展示该信息时用户会话的唯一标识符。
`user_id`: 接收该信息的用户的唯一标识符。
`fact_category`: 展示的健康教育信息的分类。
`fact_index`: 展示给用户的健康教育信息的索引编号。
`fact_text`: 展示的健康教育信息的文本内容。
### info.csv / info.parquet
本文件包含用户专属元数据,以及针对每位用户的重复采集数据(如系统推送提醒与用户置顶的健康教育信息)。
**列名**:**说明**
`user_id`: 用户的唯一标识符。
`redcap_repeat_instrument`: REDCap字段,用于标识所使用的重复录入工具。若为用户通用信息(如用户所在地与登录次数),则该字段为空;若为重复数据(如提醒信息、置顶教育信息、预定检测),则该字段会明确对应的录入工具。
`redcap_repeat_instance`: 重复录入工具的实例编号(如适用)。
`user_location`: 用户所在地(如可获取)。(1:纽约队列;2:芝加哥队列)
`alert_date`: 向用户发送提醒时的Unix时间戳。
`number_of_logins`: 用户的总登录次数。
`alert_subject`: 发送的提醒的主题或类型。
`alert_read`: 标识用户是否已读取该提醒(1:已读;0:未读)。
`end_date`: 预定检测的结束日期的Unix时间戳。
`start_date`: 预定检测的开始日期的Unix时间戳。
`fact_category`: 用户置顶的健康教育信息的分类。
`fact_index`: 用户置顶的健康教育信息的索引编号。
`fact_text`: 用户置顶的健康教育信息的文本内容。
`fact_link`: 与用户置顶的健康教育信息相关的附加信息链接(如可获取)。
### para.csv / para.parquet
本文件包含研究期间采集的附属数据(Paradata),即详尽的应用内用户交互记录。
**列名**:**说明**
`timestamp`: 用户操作或事件发生的无时区时间戳。
`session_id`: 用户会话的唯一标识符。
`user_id`: 用户的唯一标识符。
`user_action`: 具体的用户操作(如按钮点击、页面导航)。"[]clicked"表示可按压元素(如按钮、可折叠/展开菜单)被按下。
`current_page`: 用户正在交互的应用当前页面。
`browser`: 用户访问应用所使用的浏览器。
`platform`: 用户访问应用所使用的平台(如Windows、iOS)。
`platform_description`: 平台的详细描述。
`platform_maker`: 平台的制造商。
`device_name`: 所使用设备的名称。
`device_maker`: 所使用设备的制造商。
`device_brand_name`: 所使用设备的品牌名称。
`device_type`: 所使用设备的类型(如移动设备、计算机等)。
`user_location`: 用户所在地(1:纽约队列;2:芝加哥队列)。
### survey.csv / survey.parquet
本文件包含从用户处收集的调研问卷回复。
*注:由于该数据涉及敏感信息,本数据仓库未包含调研相关的CSV/Parquet文件。如有合理需求,可另行申请获取。*
**列名**:**说明**
`user_id`: 用户的唯一标识符。
`timepoint`: 调研的时间节点(基线/0个月、6个月、12个月)。
`race`: 用户的种族。
`education`: 用户的教育水平。
`health_literacy`: 用户的健康素养得分。
`health_efficacy`: 用户的健康效能得分。
`itues_mean`: 信息技术可用性评价量表(ITUES,Information Technology Usability Evaluation Scale)的平均得分。
`age`: 用户的年龄。
### tests.csv / tests.parquet
本文件包含用户在mLab应用中进行的HIV自我检测相关数据。
**列名**:**说明**
`user_id`: 完成检测的用户的唯一标识符。
`visual_analysis_date`: 用户对检测试纸进行目视判读的Unix时间戳。
`visual_result`: 目视判读的检测结果(阳性、阴性)。
`mlab_analysis_date`: mLab系统对检测结果进行分析的Unix时间戳。
`mlab_result`: mLab系统分析得出的检测结果(阳性、阴性)。
`signal_ratio`: 检测信号强度与对照信号强度的比值。
`control_signal`: mLab系统计算得出的对照信号强度。
`test_signal`: mLab系统计算得出的检测信号强度。
`browser`: 用户访问应用所使用的浏览器(源自用户代理字符串)。
`platform`: 用户访问应用所使用的平台(如Windows、iOS)(源自用户代理字符串)。
`platform_description`: 平台的详细描述(源自用户代理字符串)。
`platform_maker`: 平台的制造商(源自用户代理字符串)。
`device_name`: 所使用设备的名称(源自用户代理字符串)。
`device_maker`: 所使用设备的制造商(源自用户代理字符串)。
`device_brand_name`: 所使用设备的品牌名称(源自用户代理字符串)。
`device_type`: 所使用设备的类型(如移动设备、计算机等)(源自用户代理字符串)。
### test_windows.csv / test_windows.parquet
本文件包含为用户分配的检测窗口期相关信息。
**列名**:**说明**
`user_id`: 用户的唯一标识符。
`redcap_repeat_instance`: 重复录入工具的实例编号。
`start_date`: (硬性)检测窗口期的开始日期。
`end_date`: (硬性)检测窗口期的结束日期。
## 引用说明
若您使用本数据集,请引用mLab及mLab附属数据(Paradata)相关的发表论文。
提供机构:
figshare
创建时间:
2025-01-15



