虚假评论识别

Name: 虚假评论识别
Creator: maas
Published: 2026-05-19 19:26:09
License: 暂无描述

魔搭社区2026-05-19 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/jimmyliu12345/yelpzip

下载链接

链接失效反馈

官方服务：

资源简介：

###### 该数据集当前使用的是默认介绍模版，请根据[数据集文件规范](https://www.modelscope.cn/docs/%E6%95%B0%E6%8D%AE%E9%9B%86%E6%96%87%E4%BB%B6%E8%A7%84%E8%8C%83)及时完善数据集卡片内容。谢谢您的理解。 ### Clone with HTTP ```bash git clone https://www.modelscope.cn/datasets/jimmyliu12345/yelpzip.git ``` --- license: Apache License 2.0 #虚假评论识别 tags: - Alibaba - arxiv:1810.99999 - my free-style tag text: #二级只能属于一个task_categories fill_mask: #三级可以多选 languages: - en --- <!--- 以上YAML section提供属性/tags描述--- > <!--- 以下为markdown格式的dataset描述--- > ## 数据集描述 yelp商业数据评论 ### 数据集简介提供对于数据集的介绍，支持的使用场景（包括支持的语言等）。 ### 数据集支持的任务该数据集支持的训练任务，以及相关benchmark结果。 ## 数据集的格式和结构 ### 数据格式 text：评论文本 userid:用户id prod_id rating label date ### 数据集加载方式通过代码范例等方式，提供数据集通过MaaS/Dataset SDK进行加载和使用的详细说明。 ### 数据分片数据集是否进行了预分片（例如是否有预设的train/test/validation的数据分片）。如果有，数据的分片时如何实现的。如果没有预先分片，是否对于数据使用过程中的分片有什么推荐（比例等）。 ## 数据集生成的相关信息 ### 原始数据描述原始数据的来源以及数据的初步收集是如何进行的，是否经过归一化等处理流程。 ### 数据集标注该数据集是否包含标注，若有的话，相关信息描述。 #### 标注过程标注是通过什么方式实现的，流程如何。 #### 标注者标注者相关信息，尤其是当标着和原始数据提供者有所区别时。 ## 数据集版权信息数据集相关的版权信息，授权使用的场景和用户。是否开源，以及采用哪个开源协议等等。 ## 引用方式数据集是否有相关联的文章，以及如果在研究论文中要引用该数据集是否有推荐的引用格式等等。 ## 其他相关信息该数据集可能包含的个人和敏感信息，使用数据集需要考虑的相关背景；数据集可能包含的社会意义以及其中可能包含的bias信息和可能的局限性等等

This dataset currently uses the default introduction template. Please promptly improve the dataset card content in accordance with the [Dataset File Specification](https://www.modelscope.cn/docs/%E6%95%B0%E6%8D%AE%E9%9B%86%E6%96%87%E4%BB%B6%E8%A7%84%E8%8C%83). Thank you for your understanding. ### Clone with HTTP bash git clone https://www.modelscope.cn/datasets/jimmyliu12345/yelpzip.git --- license: Apache License 2.0 #Fake Review Detection tags: - Alibaba - arxiv:1810.99999 - my free-style tag text: #Secondary-level categories can only be assigned to one task_categories fill_mask: #Third-level categories support multiple selections languages: - en --- <!--- 以上YAML section提供属性/tags描述--- > <!--- 以下为markdown格式的dataset描述--- > ## Dataset Description Yelp business review data ### Dataset Overview Provides an introduction to the dataset and its supported usage scenarios (including supported languages, etc.). ### Supported Tasks This dataset supports various training tasks and related benchmark results. ## Dataset Format and Structure ### Data Format text: Review text userid: User ID product_id rating label date ### Dataset Loading Method Provide detailed instructions for loading and using the dataset via the MaaS/Dataset SDK, including code examples and other relevant methods. ### Data Splitting Whether the dataset has been pre-split (e.g., whether there are preset train/test/validation data splits). If yes, how was the data splitting implemented? If no pre-splitting is performed, are there any recommendations for splitting during dataset usage (such as split ratios)? ## Relevant Information for Dataset Generation ### Raw Data Describe the source of the raw data, how the initial data collection was conducted, and whether any processing procedures such as normalization have been applied. ### Dataset Annotation Whether the dataset contains annotations; if so, describe the relevant information. #### Annotation Process How was the annotation implemented, and what is the detailed workflow? #### Annotators Relevant information about the annotators, especially when the annotators differ from the original data providers. ## Dataset Copyright Information Copyright information related to the dataset, authorized usage scenarios and users, whether it is open source, and which open-source license is adopted, etc. ## Citation Method Whether the dataset has associated academic articles, and if there is a recommended citation format for referencing the dataset in research papers, etc. ## Other Relevant Information Personal and sensitive information that may be contained in the dataset, relevant background considerations for using the dataset; potential social significance of the dataset, as well as possible bias information and limitations, etc.

提供机构：

maas

创建时间：

2024-01-18

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个用于虚假评论识别的英语数据集，基于Yelp数据构建，适用于自然语言处理中的掩码填充任务。数据集由阿里巴巴相关团队发布，许可证为Apache License 2.0，更新于2024年9月，大小为563.84MB。

以上内容由遇见数据集搜集并总结生成