five

Artifact for Taxonomist: Application Detection through Rich Monitoring Data

收藏
Figshare2018-08-28 更新2026-04-08 收录
下载链接:
https://springernature.figshare.com/articles/Artifact_for_Taxonomist_Application_Detection_through_Rich_Monitoring_Data/6384248/1
下载链接
链接失效反馈
官方服务:
资源简介:
Code, documentation, data and Jupyter Notebook associated with the publication "Taxonomist: Application Detection Through Rich Monitoring Data" for the European Conference on Parallel Processing 2018.<br><br>The related study develops a technique named 'Taxonomist' to identify applications running on supercomputers, using machine learning to classify known applications and detect unknown applications. The technique uses monitoring data such as CPU and memory usage metrics and hardware counters collected from supercomputers. The aims of this technique include providing an alternative to 'naive' application detection methods based on names of processes and scripts, and helping prevent fraud, waste and abuse in supercomputers.<br>Taxonomist uses supervised learning techniques to automatically select the most relevant features that lead to reliable application identification. The process involves the following steps:<br>1. Monitoring data is collected from every compute node in a time series format.2. 11 statistical features are extracted over the time series (e.g. percentiles, minimum, maximum, mean), thus reducing storage and computation overhead.3. A classifier is trained based on a set of labeled applications, based on a 'one-versus-rest' version of that classifier - effectively for each application in the training set a separate classifier is trained to differentiate that application.<br>The dataset consists of:<br><b>README.pdf - </b>user guide for the 'Taxonomist' artifact outlining installation and instructions for using the Jupyter notebook, as well as code omissions in notebook compared to a described in Euro-Par 2018 process.<b>taxonomist.py - </b>Python file including a basic version of the Taxonomist framework. The module contents can be imported for other projects.<b>noteboook.html - </b>static HTML version of the notebook that can be viewed by a browser.<b>notebook.ipynb - </b>interactive Jupyter Notebook file, for operation see README.pdf.<b>data.zip - </b>compressed <b>.zip </b>file holding monitoring data collected from different applications executed on Volta:- <b>metadata.csv: </b>A csv file listing each run, the IDs of the nodes on which each run executed, which application was executed with which inputs, the start and end times and the duration of the applications. - <b>timeseries.tar.bz2: </b>A bzip2 compressed file containing the data collected. The uncompressed size is 16 GB, it is not necessary to uncompress for most of the notebook. -<b> features.hdf: </b>A HDF5 File containing the pre-calculated features. The calculation process is included in the notebook.<b>requirements.txt - </b>list of Python packages required.<b>LICENSE - </b>the licence under which this software is released<br>Files are in in openly accessible Python language (<b>.py </b>and<b> ipynb), .html. pdf, .csv, .txt .zip</b> and Hierarchical Data Format <b>.hdf</b> formats.<br>Experimental set-up for the experiments reported in the related publication uses Volta, a Cray XC30m supercomputer located at Sandia National Laboratories, as well as the open source monitoring tool Lightweight Distributed Metric System (LDMS).<br>
提供机构:
Ayse K. Coskun
创建时间:
2018-08-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作