five

What About Emotions? Guiding Fine-Grained Emotion Extraction from Mobile App Reviews - Replication Package

收藏
DataCite Commons2025-06-01 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/What_About_Emotions_Guiding_Fine-Grained_Emotion_Extraction_from_Mobile_App_Reviews_-_Replication_Package/28548638/2
下载链接
链接失效反馈
官方服务:
资源简介:
<i>Full paper accepted at the </i><i>33rd IEEE International Requirements Engineering 2025 conference</i><i> (Research Track).</i>📚 SummaryThis repository contains the code and data for the replication package for the paper "What About Emotions? Guiding Fine-Grained Emotion Extraction from Mobile App Reviews".📂 ContentsThe Figshare replication package contains the following files:<b>replication-package.zip: </b>contains all materials, datasets, source code and resources linked to our research. Details on the structure of this package are illustrated below.<b>ground-truth.xlsx:</b> contains the dataset of app reviews annotated with human emotions by human annotators<b>Annotation Guidelines.pdf:</b> contains the annotation guidelines for tagging sentences from app reviews with human emotions using Plutchik's taxonomy.<b>README.md:</b> contains detailed information on the replication package (.zip file). The contents of this README file are also included in this description.<b>LICENSE:</b> license information.Furthermore, the replication package file contains the following folders:<br><b>Literature review</b>: results from the literature review on opinion mining and emotion analysis within the context of software-based reviews.<b>Data</b>: data used in the study, including user reviews (input), human annotations (ground truth), and LLM-based annotations (generated by the assistants).<b>Code</b>: code used in the study, including the generative annotation, data processing, and evaluation.📖 Literature reviewStudy selection and results are available in the <code>literature_review/study-selection.xlsx</code> file. This file contains the following sheets:<code>iteration_1_IC_analysis</code>: results from the first iteration of the inclusion criteria analysis.<code>iteration_1_feature_extraction</code>: results from the first iteration of the feature extraction analysis.<code>iteration_2_IC_analysis</code>: results from the second iteration of the inclusion criteria analysis.<code>iteration_2_feature_extraction</code>: results from the second iteration of the feature extraction analysis.<code>iteration_3_IC_analysis</code>: results from the third iteration of the inclusion criteria analysis.<code>iteration_3_feature_extraction</code>: results from the third iteration of the feature extraction analysis.<code>emotions</code>: statistical analysis of emotions covered by emotion taxonomies in the selected studies.🗃️ DataThe <code>data</code> root folder contains the following files:<code>reviews.json</code> contains the reviews used in the study.<code>guidelines.txt</code> contains a .txt version of the annotation guidelines.<code>ground-truth.xlsx</code> contains the ground truth (human agreement) annotations for the reviews.In addition, the <code>data</code> root folder contains the following subfolders:<code>assistants</code> contains the IDs of the assistants used for the generative annotation (see LLM-based annotation).<code>annotations</code> contains the results of the human and LLM-based annotation: <br>-- <code>iterations</code> contains both human and LLM-based annotations for each iteration. <br>-- <code>llm-annotations</code> contains the LLM-based annotations for each assistance, including results for various temperature values: low (0), medium (0.5), and high (1) (see LLM-based annotation).<code>agreements</code> contains the results of the agreement analysis between the human and LLM-based annotations (see Data Processing).<code>evaluation</code> contains the results of the evaluation of the LLM-based annotations (see Evaluation), including statistics, Cohen's Kappa, correctness, and cost-efficiency analysis, which includes token usage and human annotation reported times.💻 CodeWe structure the code available in this replication package based on the stages involved in the LLM-based annotation process.🤖 LLM-based annotationThe <code>llm_annotation</code> folder contains the code used to generate the LLM-based annotations.There are two main scripts:<code>create_assistant.py</code> is used to create a new assistant with a particular provider and model. This class includes the definition of a common system prompt across all agents, using the <code>data/guidelines.txt</code> file as the basis.<code>annotate_emotions.py</code> is used to annotate a set of emotions using a previously created assistant. This script includes the assessment of the output format, as well as some common metrics for cost-efficiency analysis and output file generation.Our research includes an LLM-based annotation experimentation with 3 LLMs: GPT-4o, Mistral Large 2, and Gemini 2.0 Flash. To illustrate the usage of the code, in this README we refer to the code execution for generating annotations using GPT-4o. However, full code is provided for all LLMs.<b>🔑 Step 1: Add your API key</b>Add your API key to the <code>code/.env</code> file. For instance, for OpenAI, you can add the following:<pre><pre>OPENAI_API_KEY=sk-proj-...<br></pre></pre><b>🛠️ Step 2: Create an assistant</b>Create an assistant using the <code>create_assistant.py</code> script. For instance, for GPT-4o, you can run the following command:<code>python .\code\llm_annotation\create_assistant_openai.py --guidelines .\data\guidelines.txt --model gpt-4o</code>This will create an assistant loading the <code>data/guidelines.txt</code> file and using the GPT-4o model.<b>📝 Step 3: Annotate emotions</b>Annotate emotions using the <code>annotate_emotions.py</code> script. For instance, for GPT-4o, you can run the following command:<code>python .\code\llm_annotation\annotate_emotions_openai.py --input .\data\ground-truth.xlsx --output .\data\annotations\llm\temperature-00\ --batch_size 10 --model gpt-4o --temperature 0 --sleep_time 10</code>Parameters include:<code>input</code>: path to the input file containing the set of reviews to annotate (e.g., <code>data/ground-truth.xlsx</code>).<code>output</code>: path to the output folder where annotations will be saved (e.g., <code>data/annotations/llm/temperature-00/</code>).<code>batch_size</code>: number of reviews to annotate for each user request (e.g., 10).<code>model</code>: model to use for the annotation (e.g., <code>gpt-4o</code>).<code>temperature</code>: temperature for the model responses (e.g., 0).<code>sleep_time</code>: time to wait between batches, in seconds (e.g., 10).This will annotate the emotions using the assistant created in the previous step, creating a new file with the same format as in the <code>data/ground-truth.xlsx</code> file.🔄 Data processingIn this stage, we refactor all files into iterations and we consolidate the agreement between multiple annotators or LLM runs. These logic serves both for human and LLM annotations. Parameters can be updated to include more annotators or LLM runs.<b>✂️ Step 4: Split annotations into iterations</b>We split the annotations into iterations based on the number of annotators or LLM runs. For instance, for GPT-4o (run 1), we can run the following command:<code>python code\data_processing\split_annotations.py --input_file data\annotations\llm\temperature-00\gpt-4o-1-annotations.xlsx --output_dir data\annotations\iterations\</code>This facilitates the Kappa analysis and agreement in alignment with each human iteration.<b>🤝 Step 5: Analyse agreement</b>We consolidate the agreement between multiple annotators or LLM runs. For instance, for GPT-4o (run 1, 2, and 3), we can run the following command:<code>python code\evaluation\agreement.py --input-folder data\annotations\iterations\ --output-folder data\agreements\ --annotators gpt-4o-1 gpt-4o-2 gpt-4o-3</code>📊 EvaluationAfter consolidating agreements, we can evaluate both the Cohen's Kappa agreement and correctness between the human and LLM-based annotations. Our code allows any combination of annotators and LLM runs.<b>📈 Step 6: Emotion statistics</b>We evaluate the statistics of the emotions in the annotations, including emotion frequency, distribution, and correlation between emotions. For instance, for GPT-4o, we can run the following command:<code>python code\evaluation\emotion_statistics.py --input-file data\agreements\agreement_gpt-4o-1-gpt-4o-2-gpt-4o-3.xlsx --output-dir data\evaluation\statistics\gpt-4o</code><b>⚖️ Step 7: Cohen's Kappa pairwise agreement</b>We measure the average pairwise Cohen's Kappa agreement between annotators or LLM runs. For instance, for GPT-4o, we can run the following command:<code>python code\evaluation\kappa.py --input_folder data\annotations\iterations\ --output_folder data\evaluation\kappa\ --annotators gpt-4o-1,gpt-4o-2,gpt-4o-3 --exclude 0,1,2</code>In our analysis, we exclude iterations 0, 1 and 2 as they were used for guidelines refinement.<b>✅ Step 8: LLM-based annotation correctness</b>We measure the correctness (accuracy, precision, recall, and F1 score) between a set of annotated reviews and a given ground truth. For instance, for GPT-4o agreement, we can run the following command:<code>python code\evaluation\correctness.py --ground_truth data\ground-truth.xlsx --predictions data\agreements\agreement_gpt-4o-1-gpt-4o-2-gpt-4o-3.xlsx --output_dir data\evaluation\correctness\gpt-4o</code>📜 LicenseThis repository is licensed under the GPL-3.0 License. See the LICENSE file for details.
提供机构:
figshare
创建时间:
2025-05-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作