five

feature_engineered_mouse_data.csv

收藏
Figshare2025-06-23 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/feature_engineered_mouse_data_csv/29386898/1
下载链接
链接失效反馈
官方服务:
资源简介:
Feature-Engineered Mouse Dynamics Dataset For Anomaly Detection:This repository contains a preprocessed and feature-engineered dataset derived from raw mouse dynamics logs. The preprocessing pipeline was developed in Python and transforms low-level cursor activity into structured, high-dimensional behavioral features. The dataset is suitable for advanced research and practical applications in anomaly detection, behavioral biometrics, and cyber threat analytics.Preprocessing WorkflowThe preprocessing logic performs a comprehensive transformation of the raw data using the following stages:Raw Data IngestionCaptures the following fields from each mouse event:<code>x</code>, <code>y</code> coordinates<code>client_timestamp</code> (in milliseconds)Mouse <code>button</code> and <code>state</code> (Pressed/Released)Active application <code>window</code>Data is sourced from three subdirectories per user: <code>training</code>, <code>internal_tests</code>, and <code>external_tests</code>Kinematic Feature ComputationDerives time-dependent physical features:<code>velocity</code>, <code>acceleration</code>, <code>jerk</code>, and <code>curvature</code>Accounts for timestamp anomalies, division-by-zero, and missing valuesApplies directional smoothing and curvature approximation using angular differencesSession-Based Feature EngineeringComputes the following per session:<code>session_duration</code>, <code>total_distance</code><code>num_actions</code>, <code>num_clicks</code>, <code>num_strokes</code><code>mean_time_per_action</code>, <code>avg_drag_time</code>Statistical AggregationFor each derived motion variable, the following descriptors are computed:<code>mean</code>, <code>std</code>, <code>min</code>, <code>max</code>, <code>median</code>, <code>25th percentile (q25)</code>, <code>75th percentile (q75)</code>Label AlignmentMerges session-level features with binary labels from <code>labels.csv</code><code>risk = 0</code>: Normal session<code>risk = 1</code>: Anomalous session (e.g., unauthorized access)Ensures every row is traceable via <code>session_name</code>Output GenerationFinal output: <code>featurized_mouse_data.csv</code>Includes:All engineered features<code>session_name</code>, <code>serial_no.</code>, and <code>risk</code> labelFeature OverviewThe dataset includes over 38 features for each session, categorized into:Behavioral: Session time, distance, clicks, strokesKinematic: Velocity, acceleration, jerk, curvature (with summary stats)Interaction Metrics: Average drag time, time per actionApplicationsThis dataset is designed for academic and industrial use in:Insider threat and anomaly detection researchBehavioral biometric authentication modelsMouse-based session profiling and user verificationUnsupervised and semi-supervised machine learning pipelinesData Integrity and LoggingA robust logging system records all processing steps in <code>mouse_data_processing.log</code>Invalid or corrupt sessions are automatically skipped with traceable warnings
提供机构:
Reddy, Dheeraj
创建时间:
2025-06-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作