Design and Construction of Distributed JS Parsing System
收藏Figshare2014-04-10 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Design_and_Construction_of_Distributed_JS_Parsing_System/988756
下载链接
链接失效反馈官方服务:
资源简介:
This paper has two major research directions: First, to design an algorithm which can extract JS scripts in HTMLs and parse them; Second, to analyse task scheduling algorithm and design a task scheduling algorithm of this system by using Hadoop distributed computing technology. Through the research of the rules of JavaScript grammer and its existence in HTMLs, this paper designed the process and algorithm of JavaScript extraction based on JavaScript parsing engine. This is the first module. Through doing research on the Map/Reduce task scheduling algorithm, according to the characteristics of JavaScript parsing task and distributed environment, this paper also figured out the most suitable Map/Reduce task scheduling algorithm for the system to support reasonable operation of JavaScript parsing task. And then a distributed JS parsing system was constructed. In order to check the accuracy and the performance of the system, this paper had a test on the system, summarizes deficiencies and suggests improvements at last. The distributed JS parsing system can parse a large number of JS scripts in HTMLs efficiently and quickly. The experimental results show that this system can extract the text messages and urls contained in JS scripts efficiently. Thereby the research of this paper can provide a reliable technical support for the field of search engine, public opinion analysis and data acquisition.
创建时间:
2014-04-10



