Inframind

Name: Inframind
Creator: SAIKIRAN RALLABANDI
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/inframind

下载链接

链接失效反馈

官方服务：

资源简介：

Recent advances in reinforcement learning (RL) have significantly improved reasoning capabilities in Large Language Models (LLMs), with methods like GRPO and DAPO achieving breakthrough results. However, these techniques have been applied exclusively to models with 7B+ parameters, leaving open whether Small Language Models (SLMs) can benefit from similar approaches.We present the first systematic study of applying Group Relative Policy Optimization (GRPO) to sub-billion parameter models for domain-specific reasoning tasks. Our key insight is that domain-specific reward decomposition\u2014rather than scale alone\u2014is critical for eliciting reasoning in small models.We validate our approach on InfraMind, a new benchmark for Infrastructure-as-Code (IaC) generation comprising 2,000+ real GitHub samples across Terraform, Kubernetes, Docker, Ansible, and CI\/CD pipelines. Our 0.5B model achieves 97.3% accuracy with GRPO training\u2014approaching GPT-4o's 100% while being ~400\u00d7 smaller. We further extend GRPO with DAPO, achieving 96.4% accuracy while adding structured multi-step reasoning (Analysis \u2192 Plan \u2192 Code \u2192 Verify) to outputs.We release our training framework, benchmark, and model weights to facilitate research in efficient reasoning for resource-constrained deployment.

提供机构：

SAIKIRAN RALLABANDI

5,000+

优质数据集

54 个

任务类型

进入经典数据集