five

Design and Construction of Distributed JS Parsing System

收藏
NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Design_and_Construction_of_Distributed_JS_Parsing_System/988756
下载链接
链接失效反馈
官方服务:
资源简介:
This paper has two major research directions: First, to design an algorithm which can extract JS scripts in HTMLs and parse them; Second, to analyse task scheduling algorithm and design a task scheduling algorithm of this system by using Hadoop distributed computing technology. Through the research of the rules of JavaScript grammer and its existence in HTMLs, this paper designed the process and algorithm of JavaScript extraction based on JavaScript parsing engine. This is the first module. Through doing research on the Map/Reduce task scheduling algorithm, according to the characteristics of JavaScript parsing task and distributed environment, this paper also figured out the most suitable Map/Reduce task scheduling algorithm for the system to support reasonable operation of JavaScript parsing task. And then a distributed JS parsing system was constructed. In order to check the accuracy and the performance of the system, this paper had a test on the system, summarizes deficiencies and suggests improvements at last. The distributed JS parsing system can parse a large number of JS scripts in HTMLs efficiently and quickly. The experimental results show that this system can extract the text messages and urls contained in JS scripts efficiently. Thereby the research of this paper can provide a reliable technical support for the field of search engine, public opinion analysis and data acquisition.
创建时间:
2014-04-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作