Kushalkhemka/cybersec-chatml-vuln-patch-v1
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Kushalkhemka/cybersec-chatml-vuln-patch-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
- zh
tags:
- cybersecurity
- vulnerability-detection
- patch-generation
- chatml
- sft
size_categories:
- 100K<n<1M
---
# Cybersecurity ChatML SFT Dataset (Detection + Patch + Multitask)
This dataset contains ChatML records for 2 security tasks:
- Vulnerability detection (`is_vulnerable`, `cwe`, `severity` JSON output)
- Secure patch generation (assistant returns patched code only)
## Files
- `chatml_detection_train.jsonl`
- `chatml_detection_val.jsonl`
- `chatml_patch_train.jsonl`
- `chatml_patch_val.jsonl`
- `chatml_multitask_train.jsonl`
- `chatml_multitask_val.jsonl`
- `chatml_build_manifest.json`
- `unsloth_best_params_glm47flash.json`
## Record format
Each row is JSON with:
- `messages`: ChatML messages (`system`, `user`, `assistant`)
- `metadata`: source metadata
## Splits
- detection train/val: 99,162 / 5,966
- patch train/val: 58,145 / 3,527
- multitask train/val: 157,307 / 9,493
## Recommended training recipe
See `unsloth_best_params_glm47flash.json` for Unsloth Studio-ready presets.
## GLM-4.7-Flash specific notes
- Use `transformers v5` for fine-tuning support.
- Keep MoE router layer frozen (do not train router).
- For reasoning retention, keep a high reasoning-data ratio (target >=75% reasoning examples).
- Z.ai / Unsloth recommended inference defaults:
- General: `temperature=1.0`, `top_p=0.95`, `repeat_penalty=1.0`
- Tool calling: `temperature=0.7`, `top_p=1.0`, `repeat_penalty=1.0`
提供机构:
Kushalkhemka



