replication package for the paper " What Do Infrastructure-as-Code Practitioners Discuss: An empirical Study on Stack Overflow"
收藏DataCite Commons2023-07-08 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/replication_package_for_the_paper_What_Do_Infrastructure-as-Code_Practitioners_Discuss_An_empirical_Study_on_Stack_Overflow_/22734890/1
下载链接
链接失效反馈官方服务:
资源简介:
<strong>Replication Package of the Empirical Study</strong> <br> Title: "<em>What Do Infrastructure-as-Code Practitioners Discuss: An Empirical Study on Stack Overflow</em>" <br> Authors: Mahi BEGOUG, Narjes Bessghaier, Ali Ouni, Eman Abdullah AlOmar, Mohamed Wiem Mkaouer. <br> This replication package includes the following folders: <br> <strong>study_methodology</strong>: This folder contains sections explaining the extraction of IaC tags, post extraction and cleaning, and the application of LDA topic modeling. It consists of three subfolders:<br> a. <em>Extract_IaC_Tags</em>: This folder extracts IaC tags using relevance and significance metrics. The file "iac_tag_filtering.xlsx" in the data folder contains the agreement on the selected tags.<br> b. <em>Extract_Clean_IaC_Posts</em>: This folder extracts IaC-related posts from the "iac_dataset.csv" file and performs cleaning to remove irrelevant information.<br> c. <em>Apply_Topic_Modeling</em>: This folder applies LDA topic modeling. The trained model is stored in the "saved_model" folder. Additionally, the "Adapt_Genetic_Algorithm_GA" folder contains the implementation of Genetic Algorithm with LDA (see <em>ga_bootstrap.ipynb</em>). For the LDA, we used the Mallet framework. For the topic coherence, we use Gensim framework which provides the coherence model to measure the quality of topics. We set the <em>coherence parameter</em> of the coherence model at 'c_v' .for the GA, we adapt the implementation used by CISO. <strong>RQ1 folder:</strong> This folder contains the script that measures the evolution of IaC questions and the users involved in IaC discussions from 2011 to 2012, presenting the results for Research Question 1 (RQ1). <strong>RQ2 folder:</strong> This folder includes a data file named "RQ2_manual_analysis_30_random_samples.xlsx," which provides details about our labeling of IaC topics. The script "rq2.ipynb" measures the number of questions for each topic, presenting the results for Research Question 2 (RQ2). <strong>RQ3 folder:</strong> This folder contains the script that computes the difficulty and popularity metrics, presenting the results for Research Question 3 (RQ3). <br> For any suggestation and improvement, please contact us at the address: mahi.begoug.1[at]ens.etsmtl.ca
提供机构:
figshare
创建时间:
2023-07-08



