replication package for the paper " What Do Infrastructure-as-Code Practitioners Discuss: An empirical Study on Stack Overflow"
收藏Figshare2023-07-08 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/replication_package_for_the_paper_What_Do_Infrastructure-as-Code_Practitioners_Discuss_An_empirical_Study_on_Stack_Overflow_/22734890
下载链接
链接失效反馈官方服务:
资源简介:
Replication Package of the Empirical Study Title: "What Do Infrastructure-as-Code Practitioners Discuss: An Empirical Study on Stack Overflow" Authors: Mahi BEGOUG, Narjes Bessghaier, Ali Ouni, Eman Abdullah AlOmar, Mohamed Wiem Mkaouer. This replication package includes the following folders: study_methodology: This folder contains sections explaining the extraction of IaC tags, post extraction and cleaning, and the application of LDA topic modeling. It consists of three subfolders: a. Extract_IaC_Tags: This folder extracts IaC tags using relevance and significance metrics. The file "iac_tag_filtering.xlsx" in the data folder contains the agreement on the selected tags. b. Extract_Clean_IaC_Posts: This folder extracts IaC-related posts from the "iac_dataset.csv" file and performs cleaning to remove irrelevant information. c. Apply_Topic_Modeling: This folder applies LDA topic modeling. The trained model is stored in the "saved_model" folder. Additionally, the "Adapt_Genetic_Algorithm_GA" folder contains the implementation of Genetic Algorithm with LDA (see ga_bootstrap.ipynb). For the LDA, we used the Mallet framework. For the topic coherence, we use Gensim framework which provides the coherence model to measure the quality of topics. We set the coherence parameter of the coherence model at 'c_v' .for the GA, we adapt the implementation used by CISO. RQ1 folder: This folder contains the script that measures the evolution of IaC questions and the users involved in IaC discussions from 2011 to 2012, presenting the results for Research Question 1 (RQ1). RQ2 folder: This folder includes a data file named "RQ2_manual_analysis_30_random_samples.xlsx," which provides details about our labeling of IaC topics. The script "rq2.ipynb" measures the number of questions for each topic, presenting the results for Research Question 2 (RQ2). RQ3 folder: This folder contains the script that computes the difficulty and popularity metrics, presenting the results for Research Question 3 (RQ3). For any suggestation and improvement, please contact us at the address: mahi.begoug.1[at]ens.etsmtl.ca
创建时间:
2023-07-08



