Comprehensive Predictive Analytics for Collaborators' Answers, Code Quality, and Dropout: Stack Overflow Case Study – Replication Package

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14512050

下载链接

链接失效反馈

官方服务：

资源简介：

Previous studies that used data from Stack Overflow to develop predictive models often employed limited benchmarks of 3-5 models or adopted arbitrary selection methods. Despite being insightful, such approaches may not provide optimal results given their limited scope, suggesting the need to benchmark more models to avoid overlooking untested algorithms. Our study evaluates 21 algorithms across three tasks: predicting the number of question a user is likely to answer, their code quality violations, and their dropout status. We employed normalisation, standardisation, as well as logarithmic and power transformations paired with Bayesian hyperparameter optimisation and genetic algorithms. CodeBERT, a pre-trained language model for both natural and programming languages, was fine-tuned to classify user dropout given their posts (questions and answers) and code snippets. This replication package is provided for those interested in further examining our research methodology.

创建时间：

2024-12-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集