five

zasdwad/Verilog_GitHub

收藏
Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zasdwad/Verilog_GitHub
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- --- pipeline_tag: text-generation tags: - code model-index: - name: VeriGen results: - task: type: text-generation dataset: type: name: extra_gated_prompt: >- ## Model License Agreement Please read the BigCode [OpenRAIL-M license](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement) agreement before accepting it. extra_gated_fields: I accept the above license agreement, and will use the Model complying with the set of use restrictions and sharing requirements: checkbox --- # VeriGen ## Table of Contents 1. [Dataset Summary](##model-summary) 2. [Use](##use) 3. [Limitations](##limitations) 4. [License](##license) 5. [Citation](##citation) ## Dataset Summary - The dataset comprises Verilog modules as entries. The entries were retrieved from the GitHub dataset on BigQuery. - For training [models (https://huggingface.co/shailja/fine-tuned-codegen-2B-Verilog)], we filtered entries with no of characters exceeding 20000 and duplicates (exact duplicates ignoring whitespaces). - **Paper:** [ Benchmarking Large Language Models for Automated Verilog RTL Code Generation](https://arxiv.org/abs/2212.11140) - **Point of Contact:** [contact@shailja](mailto:shailja.thakur90@gmail.com) - **Languages:** Verilog (Hardware Description Language) ### Data Splits The dataset only contains a train split. ### Use ```python # pip install datasets from datasets import load_dataset ds = load_dataset("shailja/Verilog_GitHub", streaming=True, split="train") print(next(iter(ds))) #OUTPUT: ``` ### Intended Use The dataset consists of source code from a range of GitHub repositories. As such, they can potentially include non-compilable, low-quality, and vulnerable code. ### Attribution & Other Requirements The pretraining dataset of the model was not filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. # License The dataset is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement). # Citation ``` @misc{https://doi.org/10.48550/arxiv.2212.11140, doi = {10.48550/ARXIV.2212.11140}, url = {https://arxiv.org/abs/2212.11140}, author = {Thakur, Shailja and Ahmad, Baleegh and Fan, Zhenxing and Pearce, Hammond and Tan, Benjamin and Karri, Ramesh and Dolan-Gavitt, Brendan and Garg, Siddharth}, title = {Benchmarking Large Language Models for Automated Verilog RTL Code Generation}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} } ```
提供机构:
zasdwad
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作