网络舆情分析测试数据

Name: 网络舆情分析测试数据
Creator: 北京理工大学
License: 暂无描述

国家基础学科公共科学数据中心2026-01-30 收录

下载链接：

https://nbsdc.cn/general/dataDetail?id=683dea35195d26123318982a&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

网络舆情分析测试数据集，该数据集主要用于支持舆情传播机制、热点事件分析、情感倾向识别等领域的研究与应用。数据采集过程通过自主研发的精准爬虫工具完成，严格遵循各平台使用协议及相关法律法规，确保数据收集过程的合法性与合规性。数据集涵盖了微博、抖音、百度新闻、知乎、小红书、今日头条等主要社交平台上的舆论话题信息，采集内容包括热点事件名称、事件发生脉络、话题博文、博文评论内容、评论数、点赞数、浏览数等多维数据，旨在为舆情演变分析、情感分析模型构建及话题趋势预测等研究提供数据支持。所采集的评论内容主要来源于各平台上具有较高活跃度的讨论主题，重点关注情感波动较大、传播力较强及讨论热度较高的议题。为确保数据集的信息多维性与高质量，所有抓取数据经过严格的文本清洗处理，包括去除HTML标签、特殊字符和表情符号等非文本信息，并结合分词、去除停用词等技术手段对文本进行标准化处理。此外，数据集还进行了去重操作，确保数据的唯一性与高质量，避免冗余内容对分析结果的影响。该数据集规模为20MB，采用CSV格式存储，便于用户灵活处理和分析。通过这一数据集，研究人员能够深入探索网络舆情的特性，包括公众情绪分析、舆情走势预测及其对社会各领域的潜在影响。数据集的多维信息和高质量处理为学术研究、舆情监测、危机管理等应用场景提供了坚实的数据支撑。

This Online Public Opinion Analysis Test Dataset is primarily designed to support research and applications in domains including public opinion propagation mechanism, hot event analysis, and sentiment tendency recognition. The data collection process was completed using independently developed precise crawler tools, which strictly adhere to the service agreements of all involved platforms and relevant laws and regulations, ensuring the legality and compliance of the entire data collection work. The dataset covers public opinion topic information from major social platforms such as Weibo, Douyin, Baidu News, Zhihu, Xiaohongshu, and Jinri Toutiao. The collected content includes multi-dimensional data such as hot event names, event development contexts, topic posts, post comment contents, comment counts, like counts, and view counts, aiming to provide data support for research including public opinion evolution analysis, sentiment analysis model construction, and topic trend prediction. The collected comment content mainly originates from highly active discussion topics across various platforms, focusing on topics with significant emotional fluctuations, strong propagation capacity, and high discussion popularity. To ensure the multi-dimensionality and high quality of the dataset, all crawled data have undergone rigorous text cleaning procedures, including removal of non-text information such as HTML tags, special characters, and emojis, as well as text standardization via technical means like word segmentation and stop word removal. Additionally, the dataset has been subjected to deduplication operations to ensure data uniqueness and high quality, avoiding the impact of redundant content on analysis results. This dataset has a total size of 20 MB and is stored in CSV format, enabling users to flexibly process and analyze the data. Through this dataset, researchers can conduct in-depth explorations of the characteristics of online public opinion, including public sentiment analysis, public opinion trend prediction, and their potential impacts on various sectors of society. The multi-dimensional information and high-quality processing of this dataset provide solid data support for application scenarios such as academic research, public opinion monitoring, and crisis management.

提供机构：

北京理工大学

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个用于网络舆情分析的多平台测试数据，规模为20MB，包含来自微博、抖音、百度新闻等社交平台的热点事件、博文和评论等多维信息，经过严格的文本清洗和去重处理。它旨在支持舆情传播机制、情感倾向识别和话题趋势预测等研究，适用于社会治理、计算机科学等领域的学术与应用分析。

以上内容由遇见数据集搜集并总结生成