five

On-the-Fly Syntax Highlighting: Generalisation and Speed-ups - Replication Package

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14162904
下载链接
链接失效反馈
官方服务:
资源简介:
On-the-Fly Syntax Highlighting: Generalisation and Speed-ups On-the-fly syntax highlighting involves the rapid association of visual secondary notation with each character of a language derivation. This task has grown in importance due to the widespread use of online software development tools, which frequently display source code and heavily rely on efficient syntax highlighting mechanisms. In this context, resolvers must address three key demands: speed, accuracy, and development costs. Speed constraints are crucial for ensuring usability, providing responsive feedback for end users and minimizing system overhead. At the same time, precise syntax highlighting is essential for improving code comprehension. Achieving such accuracy, however, requires the ability to perform grammatical analysis, even in cases of varying correctness. Additionally, the development costs associated with supporting multiple programming languages pose a significant challenge. The technical challenges in balancing these three aspects explain why developers today experience significantly worse code syntax highlighting online compared to what they have locally. The current state-of-the-art relies on leveraging programming languages' original lexers and parsers to generate syntax highlighting oracles, which are used to train base Recurrent Neural Network models. However, questions of generalisation remain. This paper addresses this gap by extending previous work validation dataset to six mainstream programming languages thus providing a more thorough evaluation. In response to limitations related to evaluation performance and training costs, this work introduces a novel Convolutional Neural Network (CNN) based model, specifically designed to mitigate these issues. Furthermore, this work addresses an area previously unexplored performance gains when deploying such models on GPUs. The evaluation demonstrates that the new CNN-based implementation is significantly faster than existing state-of-the-art methods, while still delivering the same near-perfect accuracy.
创建时间:
2024-11-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作