Dataset for paper "Automatically Identifying Archival-worthy, Software-related Slack Conversations"
收藏Mendeley Data2024-01-31 更新2024-06-30 收录
下载链接:
https://zenodo.org/record/3468559
下载链接
链接失效反馈官方服务:
资源简介:
This dataset consists of 2000 conversations from 5 programming related Q&A channels, hosted on Slack, and accompanies the paper "Automatically Identifying Archival-worthy, Software-related Slack Conversations". In addition to the text of the conversations, each conversation has been annotated as either archival worthy or not. Our definition of archival-worthiness is: "If a conversation contains information that could be useful to other users, whether in the Slack channel or elsewhere, then it should be archived. These conversations have no determinate length and no need for objectivity. A conversation should be archived based on the availability and ease of identifying information that could help a person to gain useful software-related knowledge." Data Origin: Numerous public Slack chat channels (https://slack.com/) have recently become available that are focused on specific software engineering-related discussion topics, e.g., Python Development (https://pyslackers.com/web/slack). The data reflects a portion of the conversations on public channels related to Python, Clojure, Elm and Racket programming. Data Pre-Processing: To protect privacy, we replace usernames with fake names, and replace absolute times with relative times (in seconds). The conversations are disentangled from the overall chat stream with each unique thread in the dataset specifying a conversation in the channel. Archival-worthy conversations are marked with 1, while non-archival-worthy with 0.
创建时间:
2024-01-31



