syslrn: Learning What to Monitor for Efficient Anomaly Detection [Dataset]

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://zenodo.org/record/6374397

下载链接

链接失效反馈

官方服务：

资源简介：

This repository includes the dataset for the paper: D. Sanvito, G. Siracusano, S. Santhanam, R. Gonzalez, R. Bifulco syslrn: Learning What to Monitor for Efficient Anomaly Detection ACM EuroMLSys 2022 The dataset contains two directories at the root level: raw_dataset processed_dataset Each folder in the raw_dataset directory contains the raw monitoring data used to generate the graph associated to a single experiment together with additional metadata files. Each folder in the processed_dataset directory contains the graph associated to a single experiment as a set of three CSV files: two for the graph edges (pid_childof_pid_df.csv and pid_speakswith_pid_df.csv) and one for the graph nodes (proc_df.csv). We provide below a code snippet to parse a graph from processed_dataset directory. In both folders the name of each sub-folder is based on the following schema: [SCENARIO]_[W]wl/test_[TEST_ID] where: [SCENARIO] reports the target component for the failure injection (cinder_failure, neutron_failure, nova_failure). ff indicates instead a failure-free execution [W] reports the number of concurrent workloads [TEST_ID] reports the ID of the specific failure scenario injected (same ID selected by the OpenStack failure injection framework [1] ) Each experiment includes the following data in the raw_dataset sub-folders: audit_raw_logs_[TEST_ID]/: raw audit monitoring data bpf_tools_[TEST_ID]/: raw ebpf tools monitoring data instance-[INSTANCE_ID]/: workload-specific metadata files, e.g. stdout/stderr (generated by the OpenStack failure injection framework [1] ) logs_workload_[TEST_ID]/: OpenStack application logs perf_tools_[TEST_ID]/: raw perf tools monitoring data audit_filtered_[TEST_ID].log: audit data pre-processed by ausearch (e.g. numerical entities are resolved to symbols) failure_[TEST_ID].info: metadata information about the specific failure scenario (generated by the OpenStack failure injection framework [1] ) timestamps_[TEST_ID]: timing information [1] D. Cotroneo, L. De Simone, P. Liguori, R. Natella, N. Bidokhti - How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computing Platform [ACM ESEC/FSE 2019] Example: parsing a graph from processed_dataset directory import pandas as pd import networkx as nx def parse_csv(path): processes_df = pd.read_csv('%sproc_df.csv' % path, index_col=0).reset_index(drop=True) speakswith_edges_df = pd.read_csv('%spid_speakswith_pid_df.csv' % path, index_col=0) speakswith_edges_df['type'] = 'speaksWith' childof_edges_df = pd.read_csv('%spid_childof_pid_df.csv' % path, index_col=0) childof_edges_df['type'] = 'childOf' return processes_df, pd.concat([speakswith_edges_df, childof_edges_df], ignore_index=True) def make_graph(nodes_df, edges_df): G = nx.MultiGraph() for _, node in nodes_df.iterrows(): G.add_node(node.pid, **node) for _, edge in edges_df.iterrows(): G.add_edge(edge.pid1, edge.pid2, type=edge.type) return G PATH = 'processed_dataset/ff_1wl/test_1/' nodes_df, edges_df = parse_csv(PATH) G = make_graph(nodes_df, edges_df) nx.draw_networkx(G, node_size=10, with_labels=False) If you use this dataset for your research, please cite the following paper: @inproceedings{sanvito2022syslrn, title={syslrn: Learning What to Monitor for Efficient Anomaly Detection}, author={Sanvito, Davide and Siracusano, Giuseppe and Santhanam, Sharan and Gonzalez, Roberto and Bifulco, Roberto}, booktitle={2nd European Workshop on Machine Learning and Systems (EuroMLSys '22)}, year={2022}, address = {Rennes, France}, publisher = {ACM}, month = apr, }

创建时间：

2022-03-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集