Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection
收藏Figshare2025-07-02 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_Towards_Trustworthy_Sentiment_Analysis_in_Software_Engineering_Dataset_Characteristics_and_Tool_Selection/29250935
下载链接
链接失效反馈官方服务:
资源简介:
Dataset: Towards Trustworthy Sentiment Analysis in Software Engineering — Dataset Characteristics and Tool SelectionAuthorsMartin Obaidi, Marc Herrmann, Jil Klünder, Kurt SchneiderDescriptionThis dataset accompanies the publication:Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool SelectionThe dataset contains all coded data and annotation results from a comprehensive analysis of sentiment and linguistic characteristics in software engineering communication. The study benchmarks 14 sentiment analysis tools across 10 datasets from five major SE platforms and investigates how dataset characteristics impact tool performance and selection. The coded data underpins the development of a practical questionnaire-based recommendation approach for trustworthy and context-sensitive sentiment analysis in SE.ContentsThe dataset includes the following file:All_Sample_Sets_Coded-v04.xlsxContains manually coded sample sets from five platforms (App Reviews, Code Reviews, GitHub, Jira, Stack Overflow).Each worksheet corresponds to one platform and provides:The raw text of the communication sample (“Text”).Gold-standard sentiment labels (“oracle”): -1 = Negative, 0 = Neutral, 1 = Positive.Annotations for 13 linguistic characteristics:For each characteristic, x = present, n = not present, and an empty cell = not applicable for this item (e.g., if a characteristic is only relevant for positive statements).Enables detailed cross-platform analysis of both sentiment polarity and linguistic features in developer communication.Column details:Text: Communication/document text.oracle: Gold-standard sentiment label.Characteristic 1 – 13: See accompanying paper for definitions. Annotation can be x, n, or empty (not applicable).If you use this dataset, please cite:Obaidi, M., Herrmann, M., Klünder, J., Schneider, K. (2025).Towards Trustworthy Sentiment Analysis in Software Engineering: Dataset Characteristics and Tool Selection.In: 2025 IEEE 33rd International Requirements Engineering Conference Workshops (REW).LicenseThis dataset is provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).ContactFor questions regarding the dataset, please contact the corresponding author as listed in the publication.
创建时间:
2025-07-02



