five

wyt2000/CodeV-SVA-datasets

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/wyt2000/CodeV-SVA-datasets
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - SystemVerilog Assertion - Formal Verification size_categories: - 100K<n<1M --- # CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis <div align="center"> <a href="https://huggingface.co/wyt2000/CodeV-SVA-14B"><img src="https://img.shields.io/static/v1?label=Model&message=HuggingFace&color=yellow"></a> &ensp; <a href="https://huggingface.co/datasets/wyt2000/CodeV-SVA-datasets"><img src="https://img.shields.io/static/v1?label=Dataset&message=HuggingFace&color=red"></a> &ensp; <a href="https://github.com/wyt2000/CodeV-SVA"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=blue"></a> &ensp; </div> ## Introduction We introduce CodeV-SVA, a family of large language models designed to translate natural-language verification properties into SystemVerilog Assertions (SVAs). Open-Source Plan: - Model ✓ - Evaluation code ✓ - Paper ✓ - Dataset ✓ - Data synthesis and training code This repository includes the original RTL code of CodeV-SVA, the LLM-generated SVAs, and the NL-SVA training data selected by bidirectional translation and equivalence verification. - **CodeV-RTL-42K.jsonl**: Verilog codes with sequential signals (clock and reset) selected from the [CodeV dataset](https://huggingface.co/datasets/yang-z/CodeV-All-dataset), which can pass the syntax checking by Yosys and JasperGold. - **CodeV-SVA-dataset-without-selection-324K.jsonl**: Natural-language specifications and SVAs generated by DeepSeek-V3.1 (Section 3.1). - `reset_polarity`: True means positive reset, False means negative reset. - **CodeV-SVA-dataset-83K.jsonl**: NL-SVA pairs refined by bidirectional translation and other techniques (Section 3.2 & 3.3). - `old_specification`: the specification in 324K dataset; - `old_sva`: the sva in 324K dataset; - `specification`: SVA2NL for `old_sva`; - `sva`: `NL2SVA` for `specification`; - `equiv_result`: the result of equivalent checking between `old_sva` and `sva`. "String Match Passed" means the two svas are the same in string comparison, otherwise we show the detailed report of the equivalent checking by JasperGold. - **CodeV-SVA-Bidirectional-Translated-143K.jsonl**: Equivalent checking results of the bidirectional translated SVAs. - **CodeV-SVA-dataset-training-83K.jsonl**: Training dataset in `sharegpt` format. Please refer to our [paper](https://arxiv.org/abs/2603.14239) for more details. ## Citation ```latex @misc{wu2026qimengcodevsvatrainingspecializedllms, title={QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis}, author={Yutong Wu and Chenrui Cao and Pengwei Jin and Di Huang and Rui Zhang and Xishan Zhang and Zidong Du and Qi Guo and Xing Hu}, year={2026}, eprint={2603.14239}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2603.14239}, } ```
提供机构:
wyt2000
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作