five

Kernel Function Time Measurement Data Set for Anomaly-based Rootkit Detection

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14679674
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains time measurement data collected from kernel functions. The data sets can be used to evaluate detection mechanism for rootkits that hide files. We refer to the publication by Landauer et al. [1] for a detailed description of the data collection procedure and the detection approach, which we publish open-source on GitHub. In summary, we inject eBPF probes into the kernel of a Linux virtual machine and attach these probes to the following functions within the getdents system call: iterate_dir, filldir, verify_dirent_name, and touch_atime. We then activate CARAXES, an open-source Linux kernel module rootkit that hides files by manipulating the filldir function. Specifically, the code injected by the rootkit skips files that contain the word "caraxes". To collect the data sets, we trigger the getdents call by repeatedly enumerating the contents of a directory containing a file that is hidden by the rootkit. Time stamps of enter- and return-points of probed functions are collected via BCC and stored to disk. We also collect time measurements for the normal case where no rootkit is active to obtain a training data set for semi-supervised detection approaches. We repeat this experiment to collect a total of 1250 batches of time measurements; we consider each batch as one sample that can be used for training or testing. Over time, we also vary the parameters of the experiments and the conditions of the system. We differentiate the following scenarios: Default: No changes. File Count: The number of files in the directory is randomly increased to 20-200 files of which some are hidden by the rootkit. Filename Length: The lengths of names of files in the directory is randomly increased to 20-60 characters. ls-basic: An alternative command is used for file enumeration. System Load: We simulate system activity during collection.  The collected time measurements are stored in the events.zip file. Each file represents one batch of time measurements and has the following structure: {  "executable": "ls",  "iterations": 100,  "dir_content": ".,..,815UBF7H,VA_caraxes_13CPA5,",  "linux_version": "5.19.0-31-generic",  "drop_boundary_events": false,  "load": "",  "hidden_files": 1,  "visible_files": 1,  "description": "default",  "label": "normal",  "experiment_begin": "2024-12-11T10:58:24.582898",  "experiment_end": "2024-12-11T10:58:25.339324",  "events": [    {      "probe_point": "iterate_dir-enter",      "timestamp": 0,      "pid": 988740,      "tgid": 988740    },    {      "probe_point": "filldir64-enter",      "timestamp": 3716,      "pid": 988740,      "tgid": 988740    },    ...  ]} The file contains some information about the experiment, followed by a list of events that comprise time measurements. We explain some of the most relevant fields of the event data in the following: executable: The command used to trigger system calls, which is varied in the "ls-basic" scenario. load: The command used to simulate load, which is used in the "System Load" scenario. hidden_files and visible_files: The number of files in the directory, which are varied in the "File Count" scenario. description: Specifies the scenario in which this batch was collected. label: Specifies whether the rootkit was active during collection of this batch ("rootkit") or not ("normal). events: A list of objects with the following fields: probe_point: The name of the function where this probe was attached to, and whether entering ("-enter") or returning ("-return") was measured. timestamp: The absolute time stamp in nano seconds measured by the probe, starting with 0 for the first measurement. pid and tgid: Identifiers for process (pid) and thread group (tgid). Based on these absolute time measurements we compute delta times that represent intervals between pairs of probes. To this end we group the chronologically sorted time measurements by pid to ensure that delta times are only computed from a single process. We compute delta times using two strategies: Function-grouping compute the delta times by subtracting the time stamps of "-return" probes from time stamps of "-enter" probes for the same functions, effectively measuring the time it took to execute that function. Sequence-grouping subtracts the time stamps between any two adjacent probes. We store the delta times computed with function-grouping and sequence-grouping in the files intervals_fun.csv and intervals_seq.csv respectively. We show a sample of the file in the following: filename name id delta label description events/events_2024-12-11T10:58:25.799107_normal.json.gz verify_dirent_name-enter:verify_dirent_name-return 0 1880 normal default events/events_2024-12-11T10:58:25.799107_normal.json.gz verify_dirent_name-enter:verify_dirent_name-return 1 1266 normal default events/events_2024-12-11T10:58:25.799107_normal.json.gz verify_dirent_name-enter:verify_dirent_name-return 2 1290 normal default events/events_2024-12-11T10:58:25.799107_normal.json.gz verify_dirent_name-enter:verify_dirent_name-return 3 1286 normal default events/events_2024-12-11T10:58:25.799107_normal.json.gz verify_dirent_name-enter:verify_dirent_name-return 4 1404 normal default   The features in these data are as follows: filename: The name of the file (batch) from which the probe measurements were taken. name: The pair of probes between which the delta time is computed, where the probes are separated by colon (e.g., probe1-enter:probe2-enter). id: Running integer as identifier. delta: The computed delta time in nano seconds. label: Specifies whether the rootkit was active during collection of this batch ("rootkit") or not ("normal). description: Specifies the scenario in which this batch was collected. If you use any of these data sets, please cite the following publication: [1] Landauer, M., Alton, L., Lindorfer, M., Skopik, F., Wurzenberger, M., & Hotwagner, W. (2025). Trace of the Times: Rootkit Detection through Temporal Anomalies in Kernel Activity. Under Review.
创建时间:
2025-01-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作