five

Verilog dataset from "A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation"

收藏
DataCite Commons2025-08-21 更新2026-05-03 收录
下载链接:
https://rdr.kuleuven.be/citation?persistentId=doi:10.48804/FELKOH
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset from "A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation", which was first presented as a WiP paper at DAC 2023 and now accepted to the IEEE SOCC 2025 special session on "AI-Enhanced Semiconductor Manufacturing: Intelligent Solutions for Next-Generation Fabrication". To address the scarcity of publicly available Verilog code for training machine learning models, this study introduces a novel dataset specifically curated for Verilog autocompletion tasks. The dataset comprises over 100k Verilog files and 140k code snippets sourced from open-source repositories with permissive licenses (a list of which is available in permissive_all_deduplicated_repos.csv). It includes three subsets: file-level data, snippet-level data, and labeled definition-body pairs, each split into training, validation, and test sets. The dataset was meticulously filtered to remove autogenerated content, non-compliant licenses, and near-duplicate files, ensuring high-quality and diverse training material. Snippets were extracted using regular expressions, and additional quality control was applied by selecting files from repositories with at least one GitHub star for evaluation splits. This dataset serves as the foundation for fine-tuning pretrained language models toward Verilog code generation, enabling more effective automation in electronic design and verification workflows. More details about the dataset process can be found in the related research paper. A zipped copy of the github repository (https://github.com/99EnriqueD/verilog_autocompletion) containing code to replicate the dataset creation process has also been included in this dataset.
提供机构:
KU Leuven RDR
创建时间:
2025-07-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作