five

Kushalkhemka/cybersec-chatml-vuln-patch-v1

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Kushalkhemka/cybersec-chatml-vuln-patch-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en - zh tags: - cybersecurity - vulnerability-detection - patch-generation - chatml - sft size_categories: - 100K<n<1M --- # Cybersecurity ChatML SFT Dataset (Detection + Patch + Multitask) This dataset contains ChatML records for 2 security tasks: - Vulnerability detection (`is_vulnerable`, `cwe`, `severity` JSON output) - Secure patch generation (assistant returns patched code only) ## Files - `chatml_detection_train.jsonl` - `chatml_detection_val.jsonl` - `chatml_patch_train.jsonl` - `chatml_patch_val.jsonl` - `chatml_multitask_train.jsonl` - `chatml_multitask_val.jsonl` - `chatml_build_manifest.json` - `unsloth_best_params_glm47flash.json` ## Record format Each row is JSON with: - `messages`: ChatML messages (`system`, `user`, `assistant`) - `metadata`: source metadata ## Splits - detection train/val: 99,162 / 5,966 - patch train/val: 58,145 / 3,527 - multitask train/val: 157,307 / 9,493 ## Recommended training recipe See `unsloth_best_params_glm47flash.json` for Unsloth Studio-ready presets. ## GLM-4.7-Flash specific notes - Use `transformers v5` for fine-tuning support. - Keep MoE router layer frozen (do not train router). - For reasoning retention, keep a high reasoning-data ratio (target >=75% reasoning examples). - Z.ai / Unsloth recommended inference defaults: - General: `temperature=1.0`, `top_p=0.95`, `repeat_penalty=1.0` - Tool calling: `temperature=0.7`, `top_p=1.0`, `repeat_penalty=1.0`
提供机构:
Kushalkhemka
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作