five

Inframind

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/inframind
下载链接
链接失效反馈
官方服务:
资源简介:
Recent advances in reinforcement learning (RL) have significantly improved reasoning capabilities in Large Language Models (LLMs), with methods like GRPO and DAPO achieving breakthrough results. However, these techniques have been applied exclusively to models with 7B+ parameters, leaving open whether Small Language Models (SLMs) can benefit from similar approaches.We present the first systematic study of applying Group Relative Policy Optimization (GRPO) to sub-billion parameter models for domain-specific reasoning tasks. Our key insight is that domain-specific reward decomposition\u2014rather than scale alone\u2014is critical for eliciting reasoning in small models.We validate our approach on InfraMind, a new benchmark for Infrastructure-as-Code (IaC) generation comprising 2,000+ real GitHub samples across Terraform, Kubernetes, Docker, Ansible, and CI\/CD pipelines. Our 0.5B model achieves 97.3% accuracy with GRPO training\u2014approaching GPT-4o's 100% while being ~400\u00d7 smaller. We further extend GRPO with DAPO, achieving 96.4% accuracy while adding structured multi-step reasoning (Analysis \u2192 Plan \u2192 Code \u2192 Verify) to outputs.We release our training framework, benchmark, and model weights to facilitate research in efficient reasoning for resource-constrained deployment.
提供机构:
SAIKIRAN RALLABANDI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作