Characterizing Design Discussions With Semi-Supervised Topic Modeling
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/4718568
下载链接
链接失效反馈官方服务:
资源简介:
Note: Please refer to the README.md file for instructions.
Abstract: Stack Overflow is a rich source of questions and answers—discussions—about software development. One topic of discussion is software design, such as the correct use of design patterns, or best practices in data access. Since design is a more abstract topic in software engineering, researchers have long sought to characterize and model design knowledge. However, these approaches typically require significant expert input in order to contextualize the abstract design information. In this study, we explore how combining expert input with Stack Overflow might serve as an effective way to identify design topics. We first perform a qualitative analysis of design-tagged Stack Overflow questions and answers to identify the design concepts developers discuss. We report on areas where agreement was a challenge, including abstraction levels. Since inductive coding is expensive, we apply a semi-supervised (Anchored CorEx) approach. We find it performs as well as LDA but offers superior interpretability and the ability to guide the topic model. We leverage CorEx to characterize how design is discussed in Stack Overflow and on GitHub. We conclude by describing how our experience using the semi-supervised CorEx approach leads us to believe that approaches like CorEx that combine domain knowledge and scalability are key for analyzing large SE text repositories.
创建时间:
2022-01-21



