five

Dataset for paper "Automatically Identifying Archival-worthy, Software-related Slack Conversations"

收藏
Mendeley Data2024-01-31 更新2024-06-30 收录
下载链接:
https://zenodo.org/record/3468559
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset consists of 2000 conversations from 5 programming related Q&A channels, hosted on Slack, and accompanies the paper "Automatically Identifying Archival-worthy, Software-related Slack Conversations". In addition to the text of the conversations, each conversation has been annotated as either archival worthy or not. Our definition of archival-worthiness is: "If a conversation contains information that could be useful to other users, whether in the Slack channel or elsewhere, then it should be archived. These conversations have no determinate length and no need for objectivity. A conversation should be archived based on the availability and ease of identifying information that could help a person to gain useful software-related knowledge." Data Origin: Numerous public Slack chat channels (https://slack.com/) have recently become available that are focused on specific software engineering-related discussion topics, e.g., Python Development (https://pyslackers.com/web/slack). The data reflects a portion of the conversations on public channels related to Python, Clojure, Elm and Racket programming. Data Pre-Processing: To protect privacy, we replace usernames with fake names, and replace absolute times with relative times (in seconds). The conversations are disentangled from the overall chat stream with each unique thread in the dataset specifying a conversation in the channel. Archival-worthy conversations are marked with 1, while non-archival-worthy with 0.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作