five

On the Vulnerability Proneness of Multilingual Code

收藏
Figshare2022-09-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/On_the_Vulnerability_Proneness_of_Multilingual_Code/16528521
下载链接
链接失效反馈
官方服务:
资源简介:
Study Tool and Dataset[Environment preparation]1. Python version: 3.6 or upper version2. Dependent libraries:progressbar, nltk, textblob, sklearn, matplotlib, plotly, fuzzywuzzy, statsmodels, corpora, etc.Utilize pip install [lib_name] to install the libraries.[Running the program]1. Command linecollect.py -- for data collection, vulnerability categorization and language interfacing classification.Type "collect.py -h" for help.2. comman parameters collect.pycollect.py -s collect -- grab raw repositories from github.collect.py -s repostats -- collect basic properies for each repository.collect.py -s langstats -- empirical analysis for language information: profile size, combinations, etc.collect.py -s cmmts -- collect commits for each project, and classify the commits with fuccywuzzy.collect.py -s nbr -- NBR analysis on the dataset.collect.py -s clone -- clone all projects to local storage.collect.py -s apisniffer -- classify the projects by language interface typesWe also provide the shell script for parallel execution in multiple processes to speed up the data collection and analysis.cmmts.sh [repository number]: execute the commit collection and classification in multiple processesclone.sh [repository number]: clone the repositories to local in multiple processessniffer.sh [repository number]: identify and category the repositories by langauge interfacing mechanisms in multiple processes3. Dataset Data/OriginData/Repository_List.csv: original repository profile grabbed from github. Data/CmmtSet: original commit data by repository, each file is named as the repository ID. Data/Issues: original issue information by repository. Data/StatData/CmmtSet: classified commit data by repository, each commit can be retrieved from github through 'sha' field. Data/StatData/ApiSniffer.csv: classified repositories by language interfacing mechanisms
创建时间:
2022-09-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作