On the Vulnerability Proneness of Multilingual Code
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/On_the_Vulnerability_Proneness_of_Multilingual_Code/16528521
下载链接
链接失效反馈官方服务:
资源简介:
Study Tool and Dataset
[Environment preparation]1. Python version: 3.6 or upper version2. Dependent libraries:progressbar, nltk, textblob, sklearn, matplotlib, plotly, fuzzywuzzy, statsmodels, corpora, etc.Utilize pip install [lib_name] to install the libraries.
[Running the program]1. Command linecollect.py -- for data collection, vulnerability categorization and language interfacing classification.Type "collect.py -h" for help.
2. comman parameters<1> collect.pycollect.py -s collect -- grab raw repositories from github.collect.py -s repostats -- collect basic properies for each repository.collect.py -s langstats -- empirical analysis for language information: profile size, combinations, etc.collect.py -s cmmts -- collect commits for each project, and classify the commits with fuccywuzzy.collect.py -s nbr -- NBR analysis on the dataset.collect.py -s clone -- clone all projects to local storage.collect.py -s apisniffer -- classify the projects by language interface types
We also provide the shell script for parallel execution in multiple processes to speed up the data collection and analysis.cmmts.sh [repository number]: execute the commit collection and classification in multiple processesclone.sh [repository number]: clone the repositories to local in multiple processessniffer.sh [repository number]: identify and category the repositories by langauge interfacing mechanisms in multiple processes
3. Dataset<1> Data/OriginData/Repository_List.csv: original repository profile grabbed from github.<2> Data/CmmtSet: original commit data by repository, each file is named as the repository ID.<3> Data/Issues: original issue information by repository.<4> Data/StatData/CmmtSet: classified commit data by repository, each commit can be retrieved from github through 'sha' field.<5> Data/StatData/ApiSniffer.csv: classified repositories by language interfacing mechanisms
创建时间:
2022-09-03



