Online Repository of the Study "I want to RIDE my e-bicycle!": Supporting Developers Categorizing User Issues of a Mobility-as-a-Service Platform
收藏Zenodo2022-05-19 更新2026-05-25 收录
下载链接:
https://zenodo.org/record/6562882
下载链接
链接失效反馈官方服务:
资源简介:
<strong>Online Repository of the Study </strong><em>“I want to RIDE my e-bicycle!": Supporting Developers Categorizing User Issues of a Mobility-as-a-Service Platform</em> <strong>Introduction</strong> In the Mobility-as-a-Service (MaaS) context, e-bikes are important and environmental-friendly transportation resources providing flexibility, time and cost savings, and reducing traffic congestion. Additional to user satisfaction and marketing advantages, the resolution of user-reported issues is regulated in many cities. In order to efficiently solve the issues, it is essential to quickly identify their types (e.g., software- or hardware-related?) to assign them to the responsible team. But for popular e-mobility services, the manual analysis of the reports is inefficient because of its tediousness, high time requirements, and error-proneness. Our empirical study, carried out in the context of a <em>Mobility as a Service </em>start-up company, proposes an approach for the automated identification of relevant concerns reported by users of e-bike services. The company has more than 20,000 private customers across seven different countries and dedicates considerable effort in analyzing user behavior. However, the current manual process of analyzing and triaging user-reported issues hinders MaaS-company’s ability to grow and expand its services. To help MaaS providers identify relevant user-reported issues, In the study, we (i) manually inspect about 3,000 user-reported issues received by the MaaS company; (ii) design a taxonomy modeling the types of relevant issues reported by users; and (iii) propose MaaS-RIDE, an approach to automatically classify the user-reported issues according to the categories of the devised taxonomy. Our results demonstrate that MaaS-RIDE is able to accurately (F-measure ≥ 93%) identify software and hardware user-reported issues. This result is critical for e-bike sharing companies to address such issues in an agile way and achieve the required user satisfaction. <strong>Dataset Overview</strong> The dataset is composed of the following different sorts of data: “<em>Data_and_preprocessing</em>” folder o the user-reported issues data o the user-reported issues data processed as Bag of Words for Machine Learning training. For this look at the sub-folder “<em>input_data_for_ML</em>” and the following matrices: <em>tf-idf-matrix-of-comment_finals_with_oracle_info_low_level.csv</em> <em>tf-idf-matrix-of-comment_finals_with_oracle_info.csv</em> Moreover, a sample of selected issues was reported in the replication package: see file “<em>randomSamples.csv</em>” (due to a non-disclosure agreement with our industrial partner, we are unauthorized to share the whole raw user reports used in our experiments) “RQ1” folder: Types of E-bikes User-reported Issues the resulting taxonomy after the analysis of the issues “RQ2” folder: Classifying E-bikes Issue types the trained models the results of the models The following sections describe more in detail what each of those folders and files contain. <strong>“Data_and_preprocessing” folder</strong> <strong>User-reported issues subset.</strong> In an industrial setting, due to privacy reasons, we disclose only an example subset of the user-reported issues, this information is in the file <em>randomSamples.csv</em>. The <em>randomSamples.csv </em>a subset that was generated randomly adding 20 examples using a stratified sampling from the High-level categories and 20 from the Low-level categories. This subset is not exhaustive but serves the purpose of showing the reviewers the kind of issues that this particular industrial set is confronted with. The file contains: the Id of the user report; the column "comment_final"<strong> </strong>contains the issue text after the replacement of information that needed anonymization (e.g., vehicle-plates, personal names, addresses and timestamps); the column "High_level_category" contains the selected category from the 5 first level categories of the presented <em>Three-level taxonomy of e-bike user reported issues</em>; • the columns ‘Low_level_category" and "Fine_grained_topic" contain the assigned, if existing, respective category. <strong>Bag of Words Term by Document matrix.</strong> An important input for training the ML models is the Bag of Words representation generated after processing the 2,989 manually-labeled user issues. The result of this process is a Term-by-Document matrix. We share this matrix in the files in the sub-folder <em>input_data_for_ML </em>where they are labeled for High- and Low-level categories. In the <em>tf-idf-matrix-of-comment_finals_with_oracle_info.csv</em> and <em>tf-idf-matrix-of-comment_finals_with_oracle_info_low_level.csv</em> files, the first column refers to the issue “Id”, the last column “oracle” is the labeled category, the rest of the columns represent the terms contained in the 2,989 user-reported issues and in each row the weight of the i−𝑡ℎ term contained in the j−𝑡ℎ user issue by using the tf-idf score. <strong>“RQ1” folder</strong> <strong>“Three-level taxonomy of e-bike user-reported issues.pdf<em>” file</em></strong> The taxonomy derives from the manual analysis of the 2,989 user issues. We found that a three-level taxonomy provides significant granularity to the MaaS-company. The taxonomy encompasses 5 High-level categories, 16 Low-level categories, and 15 Low-level subcategories of e-bike user-reported issues. The file <em>Three-level taxonomy of e-bike user-reported issues.pdf</em> presents the taxonomy categories and in the columns “Nr.” and “%” it shows the number of occurrences within the analyzed dataset, and the corresponding percentages. <strong>“RQ2” folder</strong> <strong>“Trained Models” folder</strong> We provide the trained machine and deep learning models in the sub-folder <em>ML_DL_models</em>. Our approach experimented with classic machine learning models based on the Bag-of-Words approach using SVM, on Word Embeddings using FastText, and Language models leveraging BERT. The SVM and BERT models were trained using the open source low-code data analytics platform KNIME and were used to classify issues corresponding to the first and second levels of the taxonomy from the “RQ1” folder. A 10-fold cross validation strategy was used to assess the classification performance. The fastText model was trained by using default values of parameters (https://fasttext.cc/docs/en/options.html) and a 10-fold cross-validation strategy. With fastText, we classified issues corresponding only to the first level of the taxonomy from “RQ1” folder, since fastText is more effective when more data points are available in the training set (i.e., lower levels in the taxonomy have fewer well-represented issue types). <strong>“Model results” folder</strong> In the sub-folder model_results we provide the tables summarizing the results of using the proposed MaaS-RIDE approach, with which we automatically identify and categorize user-reported issues according to the High-level and Low-level categories of the taxonomy devised in RQ1, which are relevant for the MaaS-company.
提供机构:
Zenodo
创建时间:
2022-05-19



