five

红麦舆情监测系统

收藏
北京国际大数据交易所2024-06-05 收录
下载链接:
https://webs.bjidex.com/sys-bsc-home/#/bscConsole/tradingMarket/detail?id=2013
下载链接
链接失效反馈
官方服务:
资源简介:
红麦舆情监测系统有着国内先进的技术水平,系统的主要技术特色有:分布式架构、智能抓取、文章内容提取、文章相似度分析、内容倾向性分析、文章权重分析、境外采集技术以及IP防封技术。分布式架构:采用先进的分布式架构,能够通过扩充服务器规模扩展采集能力,以应对日益庞大的互联网信息规模。文章内容提取:采集站点时系统采用站点管理中添加的站点标题、内容、作者和发布时间等规则信息进行内容提取,对于没有配置内容规则的站点将尝试使用自动HTML内容萃取技术来提取正文。文章相似度分析:判断文章与文章之间是否相似性采用Apache Solr实现,在向Solr提交一个相关性查询请求时,系统会返回已经建立全文索引后的相似文章列表,当相似性阈值达到某个值则认为两篇文章是相似文章。内容倾向性分析:在计算某篇文章的倾向性时,通过语义分析、行业关键词、特殊关键词、人工辅助甄别和机器学习等方式来分析文章的倾向性。文章权重分析:计算某篇文章的权重时,通过该文章的浏览次数、回复次数、出现网站的级别和位置(重点网站和普通网站权重不同)等属性来计算文章的重要性。

Hongmai Public Opinion Monitoring System boasts domestically leading technical capabilities. Its core technical features include: distributed architecture, intelligent web crawling, article content extraction, article similarity analysis, content orientation analysis, article weight analysis, overseas data collection technology, and IP anti-blocking technology. Distributed Architecture: Adopting an advanced distributed architecture, the system can expand its collection capacity by scaling up server clusters to cope with the ever-growing scale of Internet information. Article Content Extraction: When crawling target websites, the system extracts content based on preset rules such as site title, content, author, and publication time configured in site management. For websites without configured content extraction rules, it will automatically utilize HTML content extraction technology to retrieve the main body text. Article Similarity Analysis: Apache Solr is adopted to determine the similarity between articles. When submitting a relevance query request to Solr, the system will return a list of similar articles that have been built with full-text indexes. Two articles are considered similar when their similarity score reaches the preset threshold. Content Orientation Analysis: When calculating the orientation of a given article, the system analyzes its tendency through methods such as semantic analysis, industry-specific keywords, special keywords, manual auxiliary verification, and machine learning. Article Weight Analysis: When calculating the weight of an article, the system evaluates its importance based on attributes including the number of page views, reply counts, the tier and placement of the hosting websites (key priority websites and ordinary websites have distinct weight values).
提供机构:
红麦聚信(北京)软件技术有限公司
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务