Behaviour Biometrics Dataset
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/fnf8b85kr6
下载链接
链接失效反馈官方服务:
资源简介:
The dataset provides a collection of behaviour biometrics data (commonly known as Keyboard, Mouse and Touchscreen (KMT) dynamics). The data was collected for use in a FinTech research project undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. The project called CyberSIgnature uses KMT dynamics data to distinguish between legitimate card owners and fraudsters. An application was developed that has a graphical user interface (GUI) similar to a standard online card payment form including fields for card type, name, card number, card verification code (cvc) and expiry date. Then, user KMT dynamics were captured while they entered fictitious card information on the GUI application.
The dataset consists of 1,760 KMT dynamic instances collected over 88 user sessions on the GUI application. Each user session involves 20 iterations of data entry in which the user is assigned a fictitious card information (drawn at random from a pool) to enter 10 times and subsequently presented with 10 additional card information, each to be entered once. The 10 additional card information is drawn from a pool that has been assigned or to be assigned to other users. A KMT data instance is collected during each data entry iteration. Thus, a total of 20 KMT data instances (i.e., 10 legitimate and 10 illegitimate) was collected during each user entry session on the GUI application.
The raw dataset is stored in .json format within 88 separate files. The root folder named `behaviour_biometrics_dataset' consists of two sub-folders `raw_kmt_dataset' and `feature_kmt_dataset'; and a Jupyter notebook file (kmt_feature_classificatio.ipynb). Their folder and file content is described below:
-- `raw_kmt_dataset': this folder contains 88 files, each named `raw_kmt_user_n.json', where n is a number from 0001 to 0088. Each file contains 20 instances of KMT dynamics data corresponding to a given fictitious card; and the data instances are equally split between legitimate (n = 10) and illegitimate (n = 10) classes. The legitimate class corresponds to KMT dynamics captured from the user that is assigned to the card detail; while the illegitimate class corresponds to KMT dynamics data collected from other users entering the same card detail.
-- `feature_kmt_dataset': this folder contains two sub-folders, namely: `feature_kmt_json' and `feature_kmt_xlsx'. Each folder contains 88 files (of the relevant format: .json or .xlsx) , each named `feature_kmt_user_n', where n is a number from 0001 to 0088. Each file contains 20 instances of features extracted from the corresponding `raw_kmt_user_n' file including the class labels (legitimate = 1 or illegitimate = 0).
-- `kmt_feature_classification.ipynb': this file contains python code necessary to generate features from the raw KMT files and apply simple machine learning classification task to generate results. The code is designed to run with minimal effort from the user.
本数据集包含行为生物特征数据集合,通常称为键盘、鼠标与触摸屏(Keyboard, Mouse and Touchscreen,简称KMT)动力学数据。本数据集采集自英国Edge Hill大学计算机科学系学者与研究人员开展的金融科技(FinTech)研究项目。该项目名为CyberSIgnature,旨在利用KMT动力学数据区分合法持卡人与欺诈者。研究团队开发了一款图形用户界面(Graphical User Interface,简称GUI)与标准在线信用卡支付表单相仿的应用程序,其包含卡类型、持卡人姓名、卡号、信用卡验证码(Card Verification Code,简称CVC)以及有效期等输入字段。随后,研究人员在用户通过该GUI应用输入虚拟信用卡信息的过程中,采集其KMT动力学数据。
本数据集共包含1760条KMT动力学数据实例,均采集自该GUI应用上的88次用户会话。每次用户会话包含20轮数据输入流程:用户会随机从候选池中获取一组虚拟信用卡信息,并完成10次输入;随后系统会向其展示另外10组虚拟信用卡信息,每组仅需输入一次。这额外的10组虚拟信用卡信息均来自已分配给其他用户或即将分配给其他用户的候选池。每一轮数据输入流程都会采集一条KMT数据实例,因此每次用户会话期间共可采集20条KMT数据实例(其中10条为合法类,10条为非法类)。
原始数据集以JSON格式存储在88个独立文件中。根目录文件夹名为`behaviour_biometrics_dataset`,包含两个子文件夹`raw_kmt_dataset`与`feature_kmt_dataset`,以及一个Jupyter Notebook文件`kmt_feature_classificatio.ipynb`。各文件夹与文件的内容说明如下:
-- `raw_kmt_dataset`:该文件夹包含88个文件,文件名均为`raw_kmt_user_n.json`,其中n的取值范围为0001至0088。每个文件对应一组虚拟信用卡信息,包含20条KMT动力学数据实例,且数据实例按类别均匀划分为合法类(共10条)与非法类(共10条)。合法类数据指对应虚拟信用卡信息的分配用户输入时采集的KMT动力学数据,非法类数据则指其他用户输入该组虚拟信用卡信息时采集的KMT动力学数据。
-- `feature_kmt_dataset`:该文件夹包含两个子文件夹,分别为`feature_kmt_json`与`feature_kmt_xlsx`。每个子文件夹均包含88个对应格式的文件(.json或.xlsx),文件名均为`feature_kmt_user_n`,其中n的取值范围为0001至0088。每个文件包含从对应`raw_kmt_user_n.json`文件中提取的20条特征数据,同时附带类别标签(合法类标记为1,非法类标记为0)。
-- `kmt_feature_classificatio.ipynb`:该文件包含必要的Python代码,可从原始KMT数据文件中提取特征,并通过简单机器学习分类任务生成实验结果。该代码设计为仅需用户付出极少操作成本即可运行。
创建时间:
2022-06-20



