媒体热点挖掘新闻线索及事件深度挖掘数据
收藏浙江省数据知识产权登记平台2024-12-05 更新2024-12-06 收录
下载链接:
https://www.zjip.org.cn/home/announce/trends/96481
下载链接
链接失效反馈资源简介:
1)写稿前新闻线索发现
媒体热点运用了大数据挖掘技术,针对全网不同编辑的转载习惯,综合媒体权重和时间因素,实时监测全网最新最热的新闻热点。从而实现对国内主流媒体,包括媒体旗下的报纸、网站、微信、移动客户端等稿件的监测和热点的挖掘。对于那些热度增量较高的文章,可以为编辑及记者老师在写稿前,提供新闻素材及切入角度。
2)热点事件深度挖掘
基于全网新闻媒体转载率统计,发掘媒体角度所关注的新闻热点。对于媒体热点中的新闻,媒体热点热度已经很高并在持续,更加适合基于此热点新闻进行该新闻的二次挖掘或者专题制作。
编辑和记者老师,可以选择生成的热度较高的文章,对这些文章进行分析及进一步撰写。为了保证热点新闻的时效性,针对媒体热点发现的每个热点新闻,平台每5分钟都会更新一次热度,热度值受转载媒体数量、媒体权重和时间等因素影响,通过观察热度趋势图可以发现,当单位时间转载率高于时间流逝对热度值降低的参数时,该新闻的热度值还是不断上升。当单位时间转载率低于时间流逝对热度值降低的参数时,该新闻的热度值会不断下降。平台不是单纯反应某一个点的热度,而是监控每个热点新闻整个生命周期过程,从过程中体现新闻热度价值。
具体计算公式如下:
1.详细说明
根据相似度标签统计全网数据,通过热点计算公式排序取出热度值大于1.8的文章成为媒体关注的热点文章,主要使用的算法是搜索+余弦相似度算法,入索引之前使用算法得到相似度标签,统计服务使用热度公式计算出所有文章的热度。
2.计算公式
(相似文章所有媒体权重分之和)/(发布小时数+increment)^G
小时增量(increment)为4,比重(G):1.4
3.生成说明
一共统计24小时内抓取的文章,每隔5分钟统计一次,当热度分连续1小时低于1.8分的时候,文章会被下架,前端无法搜索到这篇文章了,当这个事件热度又超过1.8分,会又出现在前端
1. News Lead Discovery Before Drafting
This module applies big data mining technology to conduct real-time monitoring of the latest and hottest news hotspots across the entire internet, taking into account the reposting habits of different editors, media authority, and time factors. It enables monitoring of articles and hotspot discovery from domestic mainstream media, including newspapers, official websites, WeChat accounts, and mobile apps under their banners. For articles with a high heat increment, it provides news materials and reporting angles for editors and journalists before they start drafting articles.
2. In-depth Mining of Hotspot Events
This module discovers news hotspots that attract media attention based on statistics of reposting rates across all online news media. For news covered by high-persistence media hotspots, it is more suitable for conducting secondary mining or special topic production based on such hotspot news. Editors and journalists can select the generated high-heat articles for analysis and further writing.
To ensure the timeliness of hotspot news, the platform updates the heat value of each discovered hotspot news every 5 minutes. The heat value is affected by factors such as the number of reposting media, media authority, and time. By observing the heat trend chart, it can be found that when the reposting rate per unit time is higher than the parameter offsetting the heat value reduction caused by time passage, the news' heat value will continue to rise; conversely, when the reposting rate per unit time is lower than this parameter, the news' heat value will keep declining. Instead of merely reflecting the heat at a single point in time, the platform monitors the entire lifecycle of each hotspot news, reflecting the heat value of the news throughout the process.
Specific calculation formulas are as follows:
1. Detailed Explanation
By counting cross-network data based on similarity tags, articles with a heat value greater than 1.8 are sorted out via the hotspot calculation formula to become media-focused hotspot articles. The main algorithms used are search + cosine similarity. Before being indexed, similarity tags are obtained through the algorithm, and the statistics service uses the heat formula to calculate the heat value of all articles.
2. Calculation Formula
(Sum of the media authority values of all similar articles) / (hours since publication + increment)^G
The hourly increment (increment) is 4, and the weighting factor (G) is 1.4.
3. Generation and Removal Rules
A total of articles crawled within the past 24 hours are counted every 5 minutes. An article will be removed from the front-end search results when its heat score is continuously lower than 1.8 for 1 hour. It will reappear on the front-end once the heat value of the corresponding event exceeds 1.8 again.
提供机构:
杭州凡闻科技有限公司
创建时间:
2024-09-23
搜集汇总
数据集介绍

特点
媒体热点挖掘新闻线索及事件深度挖掘数据是一个用于实时监测和挖掘全网新闻热点的数据集,包含1051条记录,每5分钟更新一次热度,适用于新闻编辑和记者在写稿前发现新闻线索和进行热点事件的深度挖掘。
以上内容由遇见数据集搜集并总结生成



