Adel9st/Verilog-Turkish-Dataset
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Adel9st/Verilog-Turkish-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含土耳其语和英语的硬件设计指令与Verilog RTL实现的配对。主要用于支持微调大型语言模型以生成Verilog代码,特别是针对土耳其语的硬件设计提示。数据集大小约为41,000个样本,其中土耳其语指令样本有3,737个,英语指令样本约25,000个,其余为纯Verilog RTL样本。数据来源包括GitHub的开源Verilog RTL代码、Hugging Face的公共HDL数据集以及使用LLM辅助生成的额外指令。数据集经过多阶段的清理和过滤,确保只包含有效的Verilog模块。每个条目包含一个硬件设计请求(土耳其语或英语)和对应的Verilog RTL实现。
This dataset contains Turkish and English hardware design instructions paired with Verilog RTL implementations. It was created to support fine-tuning large language models for Verilog generation, especially for Turkish hardware design prompts. The dataset size is approximately 41,000 samples, with 3,737 Turkish instruction samples, ~25,000 English instruction samples, and the remaining subset being plain Verilog RTL samples. The data sources include open-source Verilog RTL code from GitHub repositories, public HDL datasets from Hugging Face, and additional instruction generation using LLM assistance. The dataset underwent multi-stage cleaning and filtering to ensure only valid Verilog modules are included. Each entry contains a hardware design request (Turkish or English) and the corresponding Verilog RTL implementation.
提供机构:
Adel9st



