ICPC 2022 Replication Package Companion
收藏DataCite Commons2022-10-21 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/ICPC_2022_Replication_Package_Companion/19245915
下载链接
链接失效反馈官方服务:
资源简介:
<b>Dataset Information:</b>This online replication package corresponds to our qualitative study understanding modeling issues and tool-related issue (of popular modeling tools) that modelers face. The <b>icpc2022_db.sql</b> file contains the dataset of discussions from Stack Overflow, Eclipse forums, and Matlab forums associated with MDSE. Each table in the database is the extracted information from each of the data sources. The other files contain the links to the posts that were analyzed through our random sampling as part of the qualitative analysis. We also have provided the visualization of the taxonomy for the research questions.<br><b>Important Replication Details of Qualitative Study: </b>To analyze the data, we performed an open-coding process in which we had a multi-coding process to address the two research questions (i.e., the modeling-related issues and the tool-related issues). Since we performed an open-coding process, these codes were not predefined prior to the analysis. Using an open-coding is an inductive process, where in our case these codes represent the underlying issue or difficulty of which the modeler experienced in the post. Since it was multi-coding, if a post was not related to one of the research questions, the judge (i.e., the authors of this work) was instructed to use “N/A” for the code.<br> To reduce bias of a single judge determining the code of a post, each post was examined by at least two judges. We utilized a custom online tool to randomly assign the posts in our dataset (the SQL dump in this replication) in such a manner that there were two judges. The tool displayed the original title of the post, the body of the post, and a link to entire discussion thread as well as provided free-form text fields for the authors to provide a code for each of the aspects investigated (i.e., the research question) and validated the input to ensure codes were not left unfilled. <br> We performed this analysis iteratively and met to discuss the codes to ensure consistency between coding sessions. Additionally, the existing codes where merged when applicable to avoid redundant codes (i.e., semantically equivalent codes). For each iteration, a sample of 200 posts were coded by each judge, resulting in 4 iterations. In the last iteration, we observed no new codes were introduced. A final round of coding was performed to address conflicts in the coding between two judges. For each post with a conflict, a third judge was assigned to the post to resolve the disagreement. We computed Fleiss' kappa to assess inter-rater agreement (for our first research question, we observed kappa = 0.83; for our second research question, we observed kappa = 0.46). <br> To generate a taxonomy from this qualitative open-coding process, we performed card-sorting to organize the codes into higher-level categories. The codes were iteratively clustered into groups, which represent a higher-level abstraction of the codes. We performed this card-sorting until all the codes were assigned and converged on a common set of clusters. The number of clusters were not predefined, but were determined systematically through the iterative card-sorting process. Subsequently, we organized these groups hierarchically when there was a relationship between two or more groups. It is important to note that this process was done independently for each research question. In this replication package, we have the two taxonomies as well as the combined taxonomy displayed in our paper.<br> To replicate the study, the data should be analyzed independently of our codes and taxonomy to determine whether the same taxonomy is generated for the research questions. Our paper provides more details on the taxonomy and implications of our work.
提供机构:
figshare
创建时间:
2022-02-27



