High performance classification engines on parallel architectures
收藏Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://digitallibrary.usc.edu/asset-management/2A3BF1679I_X
下载链接
链接失效反馈官方服务:
资源简介:
The Internet backbone, including both core and edge routers, is becoming more flexible, scalable and programmable to enable future innovations in next generation Internet. While the functionality of Internet routers evolves, the performance remains a major concern for real-life deployment. In this thesis, we propose novel algorithms, constructions, and optimization techniques on two prominent classes of parallel architectures: Field-Programmable Gate Arrays (FPGAs), and multi-core General Purpose Processors (GPP). We focus on high-performance algorithmic solutions for two Internet application kernels: the multi-field packet classification, and the Internet traffic classification. ❧ For packet classification, we focus on algorithmic solutions to support high throughput and dynamic updates. We extend the decomposition-based packet classification approaches onto FPGA and multi-core processors. On FPGA, we present 2-dimensional pipelined architecture composed of fine-grained Processing Elements (PE). Efficient power optimization techniques are also proposed on this architecture. On multi-core processors, we use range-tree and hashing to search each field of the input packet header individually in parallel. The partial results from all the fields are merged to produce the final packet header match. Our implementations support very large rule sets consisting of many fields. ❧ For traffic classification, we present high-throughput and virtualized architectures for online traffic classification on FPGA. We provide a conversion from a decision-tree into a compact rule set table; we map the table to a 2-dimensional pipelined architecture. We develop a novel dynamic update mechanism; it requires small resource overhead and has little impact on the overall throughput. We also present a high-throughput and low-latency traffic classification engine on multi-core platforms. We convert the decision-tree used in the C4.5 algorithm into multiple hash tables. We search all the hash tables in parallel and merge the outcomes into the final classification result. High throughput can be sustained even if we scale up (1) the number of concurrent traffic classifiers, (2) the number of decision-tree leaves, and (3) the number of features examined during the classification process. ❧ For both applications, we compare the performance on various platforms with respect to throughput and latency. We vary the problem size to compare the scalability of our designs on FPGA and multi-core platforms. We also provide a detailed comparison between our approaches and existing solutions on both platforms.
创建时间:
2024-01-31



