OpenxAILabs/nix-reviewer-training
收藏Hugging Face2026-04-27 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/OpenxAILabs/nix-reviewer-training
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为nix-reviewer-training,用于训练Nix代码审查模型。数据集包含445对(损坏的Nix配置,结构化审查)数据,完全由合成生成,符合Apache-2.0许可。数据生成过程包括选择已知的失败模式、合成Nix配置、通过Nix编译器验证错误信息、生成审查评论等步骤。数据集目前覆盖3种模式:属性路径错误(如pkgs.X属性路径中的拼写错误或弃用漂移)、语法错误(如缺少分号)和参数未解构错误(如模块引用了未在函数参数中解构的inputs.X)。数据格式为每行一个JSON对象,包含prompt(损坏的Nix源代码)、completion(理想的JSON审查评论)、oracle_line(真实的Nix评估输出行号)等字段。数据集旨在用于训练专业的Nix审查模型,所有内容均为原创合成,未使用任何论坛内容或第三方代码。
The dataset is named nix-reviewer-training and is used for training Nix code review models. It contains 445 pairs of (broken Nix config, structured review) data, fully synthetic and Apache-2.0 licensed. The data generation process involves selecting known failure patterns, synthesizing Nix configurations, validating error messages through the Nix compiler, and generating review comments. The dataset currently covers 3 patterns: package attribute path drift (typos or deprecation-drift in pkgs.X attribute paths), syntax errors (missing semicolons), and flake arguments not destructured (module references inputs.X without inputs in function args). The data format is one JSON object per line, including fields like prompt (the broken Nix source), completion (the ideal JSON review), oracle_line (the real Nix eval output line number), etc. The dataset is intended for training specialist Nix-review models, with all content being original synthesis and no forum content or third-party code used.
提供机构:
OpenxAILabs



