Data: What about Haskell Bugs?
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12801145
下载链接
链接失效反馈官方服务:
资源简介:
# Data Package for: "What about Haskell Bugs?"
## General Introduction
The goal of the paper is to identify the extent to which existing bug taxonomies can classify bugs in Haskell, and how they could be adapted to better accommodate the unique features of Haskell. The specific research questions that were studied are:
RQ1 What are the most common types of Haskell bugs?
RQ2 What are the limitations of existing bug taxonomies in capturing the unique features of Haskell bugs?
RQ3 How do Haskell developers classify bugs and how is it different from the previously discussed taxonomies?
The data collected for the creation of the paper includes:
• A bug dataset of 142 bugs from 10 Haskell FOSS (free open-source software) projects;
• A qualitative codebook created after 4 interviews performed with Haskell developers about their perspectives on bugs.
## BugDB
The bug dataset contains 142 bugs, collected from 10 Haskell FOSS repositories. The repositories have been chosen from Github, to be as diverse as possible, to avoid bias towards any defect type. From each repository a number of 20 bugs were collected, or the maximum available.
The bugs have been taken based on the most recent closed issues with tags that contain the term **bug**. The issues were ensured to relate to commits or pull requests, which have been merged into the main branch. The code in the commits and pull requests was checked to include bug fixes for the specified bug report. They were ensured to contain true bugs in the code, and all duplicates were removed. Also, the fix of the bug needed to include a change in at least one *.hs* file to ensure that they are bugs related to Haskell. Note that they can modify other types of files alongside the Haskell ones.
The bug dataset has been fully classified manually, based on the two chosen taxonomies. The chosen taxonomies can be found in the *Bug_taxonomies* PDF file. The most relevant category from each of the taxonomies was the one chosen, but complex bugs, where a clear distinction was hard to make, have been documented.
## Codebook
Four open-source Haskell developers took part in a semi-structured interview. The interviews took approximately 30 minutes and were done online. They were audio recorded and transcribed. The participants were asked about how they identify, understand, classify, and fix bugs in practice, as well as what their opinion was on the proposed taxonomies. The participants agreed to the storage, usage and publication of data by signing an informed consent form. The data gathered during the interviews was used to construct a qualitative codebook, using a bottom-up approach.
## Description of the data in this data set
| File Name | File Format | Description |
| --------- | ----------- | ----------- |
| Bug_taxonomies | PDF | This file includes the 2 taxonomies used in the study. It provides the citation and an explicit description of each of the types of bugs as proposed by each respective taxonomy. |
| Codebook | PDF | This file is the qualitative codebook produced from the 4 interviews with Haskell developers. It is split into 5 columns: Theme, Code, Explanation, Examples, and Partial-fit Examples. The last two columns include examples, relating to the specific code and theme, taken from the interview transcripts. |
| bug_db | CSV | This file represents the bugs collected from Github issues and classified according to the 2 taxonomies. The columns are **1.Repository**: name of the repository from which the bug was taken; **2.Bug Name**: name of the issue as it appears on the GitHub page; **3.Link to bug**: link to the Github issue of the respective bug; **4.Taxonomy Catolino**: categorization according to the taxonomy proposed by Catolino et al. (See *Bug_taxonomies* for more information) This can only include one of the types of bugs from this taxonomy; **5.Taxonomy Seaman**: categorization according to the taxonomy proposed by Seaman et al. (See *Bug_taxonomies* for more information) This can only include one of the types of bugs from this taxonomy; **6.Notes**: This includes details explaining the decisions made when choosing the types for each of the bugs. When the text includes the term *COMPLEX*, the bug is considered a complex bug, meaning it might also fit as other types of bugs within the same taxonomy. In this case, the types are specifically mentioned after the phrase *Might also be considered*. |
| bug_db_result_counts | CSV | This file contains 3 tables. The first one is the count and percentages per type of bug for the taxonomy proposed by Catolino et al. The second one is the count and percentages per type of bug for the taxonomy proposed by Seaman et al. The third one is a representation of the Complex bugs. It has 4 columns: **Chosen type**: the type as chosen in the taxonomy column of the complex bugs. **Other type**: the types as mentioned in the Notes column of the complex bugs. **Count**: the number of bugs that have the respective chosen type in the taxonomy column and the other type in the notes. **Reverse count** the number of bugs that have the chosen type in the notes column and the other type in the taxonomy column. The one bug that has 2 other types has been split into separate instances with Logic as the chosen type, one with Algorithm/Method as the other type, and one with Checking as the other type.|
## Interview Transcipts
The interview transcripts are not shared due to data protection.
We have further explicitly ensured participants to not share their transcripts, but only the anonymized codebook.
创建时间:
2024-07-23



