mehmetdavut/RubyCraft-3.4-Eval-Logs
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/mehmetdavut/RubyCraft-3.4-Eval-Logs
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了针对Ruby 3.4语法适应的小型语言模型(SLM)架构研究的全面评估日志,包括原始和处理后的输出。数据集涵盖了超过26,000个评估行,这些评估行是在96个LoRA配置、4个基础模型和多个教师模型上生成的。数据集分为详细日志(JSONL)和聚合指标(CSV)两个层次,详细日志包括HumanEval-rb基准测试的161个测试任务和40个自定义设计的任务,聚合指标展示了诊断净化程序(DSP)对通过率和风格分数的改进效果。每个JSONL条目包含丰富的元数据,如原始提示、模型响应、净化后的响应、评估日志和触发的净化规则。这些日志为研究SLM中的“格式化幻觉”现象提供了实证基础,展示了DSP方法如何通过弥合合规性差距来恢复模型性能。
This dataset contains the comprehensive evaluation logs, including raw and processed outputs, for our research on the adaptation of Small Language Model (SLM) architectures to Ruby 3.4 syntax. It covers more than 26,000 evaluation rows generated across 96 LoRA configurations, 4 base models, and multiple teacher models. The dataset is organized into detailed logs (JSONL) and aggregated metrics (CSV). The detailed logs include results for 161 test tasks from the HumanEval-rb benchmark and 40 custom-designed tasks testing specific modern Ruby 3.4 features. The aggregated metrics demonstrate the impact of the Diagnostic Sanitization Procedure (DSP) on pass rates and style scores. Each JSONL entry includes rich metadata such as the original prompt, model response, sanitized response, evaluation logs, and triggered sanitization rules. These logs serve as the empirical foundation for findings regarding Formatting Hallucinations in SLMs, showing how the DSP methodology recovers model performance by bridging the compliance gap.
提供机构:
mehmetdavut



