five

Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection

收藏
Figshare2025-07-02 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_Towards_Trustworthy_Sentiment_Analysis_in_Software_Engineering_Dataset_Characteristics_and_Tool_Selection/29250935
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering — Dataset Characteristics and Tool SelectionAuthorsMartin Obaidi, Marc Herrmann, Jil Klünder, Kurt SchneiderDescriptionThis dataset accompanies the publication:Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool SelectionThe dataset contains all coded data and annotation results from a comprehensive analysis of sentiment and linguistic characteristics in software engineering communication. The study benchmarks 14 sentiment analysis tools across 10 datasets from five major SE platforms and investigates how dataset characteristics impact tool performance and selection. The coded data underpins the development of a practical questionnaire-based recommendation approach for trustworthy and context-sensitive sentiment analysis in SE.ContentsThe dataset includes the following file:All_Sample_Sets_Coded-v04.xlsxContains manually coded sample sets from five platforms (App Reviews, Code Reviews, GitHub, Jira, Stack Overflow).Each worksheet corresponds to one platform and provides:The raw text of the communication sample (“Text”).Gold-standard sentiment labels (“oracle”): -1 = Negative, 0 = Neutral, 1 = Positive.Annotations for 13 linguistic characteristics:For each characteristic, x = present, n = not present, and an empty cell = not applicable for this item (e.g., if a characteristic is only relevant for positive statements).Enables detailed cross-platform analysis of both sentiment polarity and linguistic features in developer communication.Column details:Text: Communication/document text.oracle: Gold-standard sentiment label.Characteristic 1 – 13: See accompanying paper for definitions. Annotation can be x, n, or empty (not applicable).If you use this dataset, please cite:Obaidi, M., Herrmann, M., Klünder, J., Schneider, K. (2025).Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection.In: 2025 IEEE 33rd International Requirements Engineering Conference Workshops (REW).LicenseThis dataset is provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).ContactFor questions regarding the dataset, please contact the corresponding author as listed in the publication.
创建时间:
2025-07-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作