five

Characterizing Design Discussions With Semi-Supervised Topic Modeling

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/4718568
下载链接
链接失效反馈
官方服务:
资源简介:
Note: Please refer to the README.md file for instructions. Abstract: Stack Overflow is a rich source of questions and answers—discussions—about software development. One topic of discussion is software design, such as the correct use of design patterns, or best practices in data access. Since design is a more abstract topic in software engineering, researchers have long sought to characterize and model design knowledge. However, these approaches typically require significant expert input in order to contextualize the abstract design information. In this study, we explore how combining expert input with Stack Overflow might serve as an effective way to identify design topics. We first perform a qualitative analysis of design-tagged Stack Overflow questions and answers to identify the design concepts developers discuss. We report on areas where agreement was a challenge, including abstraction levels. Since inductive coding is expensive, we apply a semi-supervised (Anchored CorEx) approach. We find it performs as well as LDA but offers superior interpretability and the ability to guide the topic model. We leverage CorEx to characterize how design is discussed in Stack Overflow and on GitHub. We conclude by describing how our experience using the semi-supervised CorEx approach leads us to believe that approaches like CorEx that combine domain knowledge and scalability are key for analyzing large SE text repositories.
创建时间:
2022-01-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作