zasdwad/Verilog_GitHub

Name: zasdwad/Verilog_GitHub
Creator: zasdwad
Published: 2026-04-12 11:11:53
License: 暂无描述

Hugging Face2026-04-12 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/zasdwad/Verilog_GitHub

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit --- --- pipeline_tag: text-generation tags: - code model-index: - name: VeriGen results: - task: type: text-generation dataset: type: name: extra_gated_prompt: >- ## Model License Agreement Please read the BigCode [OpenRAIL-M license](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement) agreement before accepting it. extra_gated_fields: I accept the above license agreement, and will use the Model complying with the set of use restrictions and sharing requirements: checkbox --- # VeriGen ## Table of Contents 1. [Dataset Summary](##model-summary) 2. [Use](##use) 3. [Limitations](##limitations) 4. [License](##license) 5. [Citation](##citation) ## Dataset Summary - The dataset comprises Verilog modules as entries. The entries were retrieved from the GitHub dataset on BigQuery. - For training [models (https://huggingface.co/shailja/fine-tuned-codegen-2B-Verilog)], we filtered entries with no of characters exceeding 20000 and duplicates (exact duplicates ignoring whitespaces). - **Paper:** [ Benchmarking Large Language Models for Automated Verilog RTL Code Generation](https://arxiv.org/abs/2212.11140) - **Point of Contact:** [contact@shailja](mailto:shailja.thakur90@gmail.com) - **Languages:** Verilog (Hardware Description Language) ### Data Splits The dataset only contains a train split. ### Use ```python # pip install datasets from datasets import load_dataset ds = load_dataset("shailja/Verilog_GitHub", streaming=True, split="train") print(next(iter(ds))) #OUTPUT: ``` ### Intended Use The dataset consists of source code from a range of GitHub repositories. As such, they can potentially include non-compilable, low-quality, and vulnerable code. ### Attribution & Other Requirements The pretraining dataset of the model was not filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. # License The dataset is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement). # Citation ``` @misc{https://doi.org/10.48550/arxiv.2212.11140, doi = {10.48550/ARXIV.2212.11140}, url = {https://arxiv.org/abs/2212.11140}, author = {Thakur, Shailja and Ahmad, Baleegh and Fan, Zhenxing and Pearce, Hammond and Tan, Benjamin and Karri, Ramesh and Dolan-Gavitt, Brendan and Garg, Siddharth}, title = {Benchmarking Large Language Models for Automated Verilog RTL Code Generation}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} } ```

提供机构：

zasdwad

5,000+

优质数据集

54 个

任务类型

进入经典数据集