sealuzh/app_reviews

Name: sealuzh/app_reviews
Creator: sealuzh
Published: 2024-01-09 12:30:17
License: 暂无描述

Hugging Face2024-01-09 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/sealuzh/app_reviews

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language_creators: - crowdsourced language: - en license: - unknown multilinguality: - monolingual size_categories: - 100K<n<1M source_datasets: - original task_categories: - text-classification task_ids: - text-scoring - sentiment-scoring pretty_name: AppReviews dataset_info: features: - name: package_name dtype: string - name: review dtype: string - name: date dtype: string - name: star dtype: int8 splits: - name: train num_bytes: 32768731 num_examples: 288065 download_size: 13207727 dataset_size: 32768731 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for [Dataset Name] ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [Home Page](https://github.com/sealuzh/user_quality) - **Repository:** [Repo Link](https://github.com/sealuzh/user_quality) - **Paper:** [Link](https://giograno.me/assets/pdf/workshop/wama17.pdf) - **Leaderboard: - **Point of Contact:** [Darshan Gandhi](darshangandhi1151@gmail.com) ### Dataset Summary It is a large dataset of Android applications belonging to 23 differentapps categories, which provides an overview of the types of feedback users report on the apps and documents the evolution of the related code metrics. The dataset contains about 395 applications of the F-Droid repository, including around 600 versions, 280,000 user reviews (extracted with specific text mining approaches) ### Supported Tasks and Leaderboards The dataset we provide comprises 395 different apps from F-Droid repository, including code quality indicators of 629 versions of these apps. It also encloses app reviews related to each of these versions, which have been automatically categorized classifying types of user feedback from a software maintenance and evolution perspective. ### Languages The dataset is a monolingual dataset which has the messages English. ## Dataset Structure ### Data Instances The dataset consists of a message in English. {'package_name': 'com.mantz_it.rfanalyzer', 'review': "Great app! The new version now works on my Bravia Android TV which is great as it's right by my rooftop aerial cable. The scan feature would be useful...any ETA on when this will be available? Also the option to import a list of bookmarks e.g. from a simple properties file would be useful.", 'date': 'October 12 2016', 'star': 4} ### Data Fields * package_name : Name of the Software Application Package * review : Message of the user * date : date when the user posted the review * star : rating provied by the user for the application ### Data Splits There is training data, with a total of : 288065 ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset With the help of this dataset one can try to understand more about software applications and what are the views and opinions of the users about them. This helps to understand more about which type of software applications are prefeered by the users and how do these applications facilitate the user to help them solve their problems and issues. ### Discussion of Biases The reviews are only for applications which are in the open-source software applications, the other sectors have not been considered here ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators Giovanni Grano - (University of Zurich), Sebastiano Panichella - (University of Zurich), Andrea di Sorbo - (University of Sannio) ### Licensing Information [More Information Needed] ### Citation Information @InProceedings{Zurich Open Repository and Archive:dataset, title = {Software Applications User Reviews}, authors={Grano, Giovanni; Di Sorbo, Andrea; Mercaldo, Francesco; Visaggio, Corrado A; Canfora, Gerardo; Panichella, Sebastiano}, year={2017} } ### Contributions Thanks to [@darshan-gandhi](https://github.com/darshan-gandhi) for adding this dataset.

annotations_creators: - 众包 language_creators: - 众包 language: - 英语 license: - 未知 multilinguality: - 单语言 size_categories: - 100K<n<1M source_datasets: - 原创 task_categories: - 文本分类 task_ids: - 文本评分 - 情感评分 pretty_name: AppReviews dataset_info: features: - name: package_name dtype: string - name: review dtype: string - name: date dtype: string - name: star dtype: int8 splits: - name: train num_bytes: 32768731 num_examples: 288065 download_size: 13207727 dataset_size: 32768731 configs: - config_name: default data_files: - split: train path: data/train-* --- # AppReviews数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [遴选依据](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集策展人](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献者](#contributions) ## 数据集描述 - **主页：** [主页链接](https://github.com/sealuzh/user_quality) - **仓库：** [代码仓库](https://github.com/sealuzh/user_quality) - **论文：** [论文链接](https://giograno.me/assets/pdf/workshop/wama17.pdf) - **排行榜：** 暂无 - **联系人：** [Darshan Gandhi](darshangandhi1151@gmail.com) ### 数据集摘要本数据集为涵盖23个不同类别的安卓应用大型数据集，可全面展示用户针对各类应用反馈的类型，并记录相关代码度量指标的演化历程。数据集包含F-Droid仓库中的约395款应用（涵盖近600个版本），以及通过专属文本挖掘方法提取的28万条用户评论。 ### 支持任务与排行榜本数据集收录了F-Droid仓库中的395款应用，包含这些应用共计629个版本的代码质量指标；同时还包含与各版本相关的用户评论，这些评论已从软件维护与演化的视角出发，自动分类了用户反馈的类型。目前暂无对应排行榜。 ### 语言本数据集为单语言数据集，所有文本均为英语。 ## 数据集结构 ### 数据实例数据集包含英文用户评论，典型数据实例如下： python { 'package_name': 'com.mantz_it.rfanalyzer', 'review': "超棒的应用！新版本如今可在我的索尼BRAVIA安卓电视上运行，这十分实用，因为它正好搭配我屋顶的天线线缆。扫描功能非常实用……请问该功能何时可正式上线？此外，支持从简单属性文件导入书签列表的功能也会很有帮助。", 'date': '2016年10月12日', 'star': 4 } ### 数据字段 * `package_name`：应用程序包名称 * `review`：用户评论内容 * `date`：用户发布评论的日期 * `star`：用户为应用给出的星级评分 ### 数据划分本数据集仅包含训练集，总样本量为288065条。 ## 数据集构建 ### 遴选依据 [需补充更多信息] ### 源数据 #### 初始数据收集与标准化 [需补充更多信息] #### 源语言生产者是谁？ [需补充更多信息] ### 标注信息 #### 标注流程 [需补充更多信息] #### 标注者是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响借助本数据集，研究者可进一步探索软件应用的用户视角与评价，了解用户偏好的应用类型，以及应用如何帮助用户解决问题与痛点。 ### 偏差讨论本数据集的评论仅针对开源软件应用，未覆盖其他领域的应用。 ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集策展人 Giovanni Grano（苏黎世大学）、Sebastiano Panichella（苏黎世大学）、Andrea di Sorbo（萨莫奈大学） ### 许可信息 [需补充更多信息] ### 引用信息 bibtex @InProceedings{Zurich_Open_Repository_and_Archive:dataset, title = {Software Applications User Reviews}, authors={Grano, Giovanni; Di Sorbo, Andrea; Mercaldo, Francesco; Visaggio, Corrado A; Canfora, Gerardo; Panichella, Sebastiano}, year={2017} } ### 贡献者感谢[@darshan-gandhi](https://github.com/darshan-gandhi) 添加本数据集。

提供机构：

sealuzh

原始信息汇总

数据集概述

数据集描述

数据集名称: AppReviews
语言: 英语
许可证: 未知
多语言性: 单语种
数据集大小类别: 100K<n<1M
源数据集: 原始数据
任务类别: 文本分类
任务ID: 文本评分、情感评分

数据集结构

数据实例

json { "package_name": "com.mantz_it.rfanalyzer", "review": "Great app! The new version now works on my Bravia Android TV which is great as its right by my rooftop aerial cable. The scan feature would be useful...any ETA on when this will be available? Also the option to import a list of bookmarks e.g. from a simple properties file would be useful.", "date": "October 12 2016", "star": 4 }

数据字段

package_name: 软件应用包名称
review: 用户留言
date: 用户发布评论的日期
star: 用户提供的应用评分

数据分割

训练集: 288065条数据

数据集创建

数据集摘要

该数据集包含来自F-Droid仓库的395个不同应用，包括629个版本的代码质量指标和相关版本的评论，这些评论已自动分类，从软件维护和演化的角度对用户反馈进行分类。

使用数据集的注意事项

数据集的社会影响

通过该数据集，可以更好地理解软件应用以及用户对它们的看法和意见，有助于了解用户偏好的软件应用类型以及这些应用如何帮助用户解决问题。

偏见讨论

评论仅限于开源软件应用，其他领域未被考虑。

其他信息

数据集策展人

Giovanni Grano (苏黎世大学)
Sebastiano Panichella (苏黎世大学)
Andrea di Sorbo (萨尼奥大学)

引用信息

plaintext @InProceedings{Zurich Open Repository and Archive:dataset, title = {Software Applications User Reviews}, authors = {Grano, Giovanni; Di Sorbo, Andrea; Mercaldo, Francesco; Visaggio, Corrado A; Canfora, Gerardo; Panichella, Sebastiano}, year = {2017} }

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个包含大量Android应用用户评论的集合，涵盖395个不同应用的28.8万条评论，每条评论包含应用包名、评论文本、日期和星级评分，适用于文本分类和情感分析研究。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集