five

Replication Package: Vulnerably (Mis)Configured? Exploring 10 Years of Developers' Q&As on Stack Overflow

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10245331
下载链接
链接失效反馈
官方服务:
资源简介:
Welcome to the public repository for the additional content of the paper "Vulnerably (Mis)Configured? Exploring 10 Years of Developers' Q&As on Stack Overflow", accepted at the International Working Conference on Variability Modelling of Software-Intensive Systems (VAMOS) 2024. This repository provides additional information to the conducted exploratory study on configuration-related vulnerabilities, including the following files: README.txt LICENSE.txt DATASET_CONFIG_VULN_SO.csv: sheet containing data of 651 StackOverflow posts, including additional classifications based on manual analyses and automatic topic modeling Instructions for using the dataset Download and open the dataset (platform-independent CSV file). The dataset includes 16 columns (A – P):- Columns A – J: Original data fetched from the BigQuery Stack Overflow dataset (Question_ID, Year_Asked, Question_Title, Question_Body, Question_Tags, View_Count, Question_Rating, Favorite_Count, Status, Answer_Count)- Columns K – N: Manually extracted data from the Stack Overflow posts (System, Configuration Context, Security Context, Topic)- Column O: Data based on the automated topic modeling (Configuration Topic)- Column P: Additional data extracted from the Stack Overflow posts without further classifications (Additional Comments) Requirements No requirements Further information The dataset is based on a search string (SQL query; August 1, 2023) applied on the Google BigQuery Stack Overflow dataset: ("secur*") AND ("vulnerabilit*" OR "weakness*" OR "breach*" OR "exposure*" OR "CVE*" OR "CWE*") AND ("config*") Originally, the dataset included 1,235 post which were limited by the first and second authors to 651 posts (34 deleted posts, 550 posts out of scope) using the following selection criteria: - The post has been created in the last decade (2013-2022).- The post is still available on the Stack Overflow website.- The post is directly connected to a vulnerability-related issue in the context of configuring. Topic modeling algorithm used: Latent Dirichlet Allocation (LDA)- Settings: 200 iterations (coherence value = 0.6 for k = 7 to 11), α = k, β = 0.01
创建时间:
2023-12-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作