Feature-Engineered Mouse Dynamics Dataset For Anomaly Detection
收藏Figshare2025-06-23 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/feature_engineered_mouse_data_csv/29386898
下载链接
链接失效反馈官方服务:
资源简介:
Feature-Engineered Mouse Dynamics Dataset For Anomaly Detection:This repository contains a preprocessed and feature-engineered dataset derived from raw mouse dynamics logs. The preprocessing pipeline was developed in Python and transforms low-level cursor activity into structured, high-dimensional behavioral features. The dataset is suitable for advanced research and practical applications in anomaly detection, behavioral biometrics, and cyber threat analytics.Preprocessing WorkflowThe preprocessing logic performs a comprehensive transformation of the raw data using the following stages:Raw Data IngestionCaptures the following fields from each mouse event:x, y coordinatesclient_timestamp (in milliseconds)Mouse button and state (Pressed/Released)Active application windowData is sourced from three subdirectories per user: training, internal_tests, and external_testsKinematic Feature ComputationDerives time-dependent physical features:velocity, acceleration, jerk, and curvatureAccounts for timestamp anomalies, division-by-zero, and missing valuesApplies directional smoothing and curvature approximation using angular differencesSession-Based Feature EngineeringComputes the following per session:session_duration, total_distancenum_actions, num_clicks, num_strokesmean_time_per_action, avg_drag_timeStatistical AggregationFor each derived motion variable, the following descriptors are computed:mean, std, min, max, median, 25th percentile (q25), 75th percentile (q75)Label AlignmentMerges session-level features with binary labels from labels.csvrisk = 0: Normal sessionrisk = 1: Anomalous session (e.g., unauthorized access)Ensures every row is traceable via session_nameOutput GenerationFinal output: featurized_mouse_data.csvIncludes:All engineered featuressession_name, serial_no., and risk labelFeature OverviewThe dataset includes over 38 features for each session, categorized into:Behavioral: Session time, distance, clicks, strokesKinematic: Velocity, acceleration, jerk, curvature (with summary stats)Interaction Metrics: Average drag time, time per actionApplicationsThis dataset is designed for academic and industrial use in:Insider threat and anomaly detection researchBehavioral biometric authentication modelsMouse-based session profiling and user verificationUnsupervised and semi-supervised machine learning pipelinesData Integrity and LoggingA robust logging system records all processing steps in mouse_data_processing.logInvalid or corrupt sessions are automatically skipped with traceable warnings
创建时间:
2025-06-23



