five

Direct Arabic products' opinions data set for opinion mining and sentiment analysis\"

收藏
DataONE2019-08-05 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:66065be2e75ca4b2fe12e910b59d3ddc3490e1f541f7509f6e820cf4baeb3baa
下载链接
链接失效反馈
官方服务:
资源简介:
The products' opinions in Arabsentiment dataset is collected manually from different social products' resources for opinion mining, feature extraction and sentiment analysis tasks. The collected opinions included different types of direct opinions that include at least one product feature whether it stated explicitly or in implicit manner. The dataset contains twenty different products categories like home, baby, different types of software products and other product types. Additionally, the products’ features are identified manually from the customer opinions and the product description. The products are classified according to each product type and there is a specific search query related to each type. For each product, the product name and brief description about the product capabilities are registered in products information file and classified to specific product types with a specific initial query for each type. The collected data contains opinions about twenty different products' categories. These opinions are selected based on the text size and the number of features that appear in the opinionated text. For each opinion, we keep track of the opinionated text and the sentiment rating score entered by the customers. The rating score represent the overall polarity of the reviewer towards the products into one of two categories: positive or negative sentiment. The main dataset attributes involve the total number of directed opinions used in dataset that should include at least one explicit product features, the number of opinions with positive sentiment score is 1459 and negative sentiment polarity score is 516.

阿拉伯情感(Arabsentiment)数据集所收录的产品评论,均由人工从各类产品相关的社交资源中采集,用于评论挖掘、特征提取与情感分析等研究任务。所采集的评论涵盖多类直接评论,每条评论至少包含一项产品特征,该特征既可显性表述,亦可隐性提及。该数据集包含20个产品类别,涵盖家居、母婴、各类软件产品及其他品类。此外,数据集涉及的产品特征均由人工从用户评论与产品说明中提取标注。数据集依据产品类型对样本进行分类,每一类产品均配有专属检索查询词。针对每款产品,其名称及功能简介均已录入产品信息文件,并按对应品类完成分类,每类产品绑定专属初始检索词。本次采集的数据包含针对20类产品的用户评论,样本筛选依据为文本长度与评论文本中出现的特征数量。每条评论均保留其文本内容及用户标注的情感评分,该评分将评论者对产品的整体情感倾向划分为正面与负面两类。数据集的核心统计属性包括:至少包含一项显性产品特征的直接评论总样本量,其中正面情感评论共1459条,负面情感评论共516条。
创建时间:
2023-11-22
二维码
社区交流群
二维码
科研交流群
商业服务