five

y0sif/Arcwright-Axum

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/y0sif/Arcwright-Axum
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en tags: - rust - code - instruction-tuning - axum - chatml size_categories: - 1K<n<10K task_categories: - text-generation --- # Arcwright-Axum An instruction-tuning dataset for the **[Axum](https://github.com/tokio-rs/axum)** Rust crate, built for the [Arcwright](https://huggingface.co/y0sif/arcwright-E4B-v1) fine-tuned model. Axum is a web application framework built on top of Tokio and Tower, focused on ergonomics and modularity. ## Dataset Summary - **741 instruction-response pairs** covering http routing, middleware, extractors (path, query, json, state), error handling, tower layers, and websocket support - Generated from real source code using the **OSS-Instruct** methodology via Claude Code sub-agents - Validated for structural correctness and deduplicated using MinHash (Jaccard threshold 0.7) - Format: **ChatML** (messages array with system/user/assistant roles) ## Category Distribution | Category | Count | % | |----------|-------|---| | Code Generation | 233 | 31% | | Code Explanation | 141 | 19% | | Api Usage | 138 | 18% | | Bug Detection | 72 | 9% | | Refactoring | 83 | 11% | | Test Generation | 74 | 9% | ## Format Each example is a JSON object with a `messages` array: ```json { "messages": [ {"role": "system", "content": "You are an expert Rust programmer specializing in the axum crate and modern Rust development patterns."}, {"role": "user", "content": "Write an Axum web server with a POST /users endpoint that accepts JSON input, validates it, and returns a created user with a generated ID."}, {"role": "assistant", "content": "..."} ], "category": "code_generation", "crate": "axum" } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("y0sif/Arcwright-Axum") print(dataset["train"][0]["messages"]) ``` ## Part of Arcwright This dataset is one of three crate-specific datasets used to train [Arcwright-E4B-v1](https://huggingface.co/y0sif/arcwright-E4B-v1): | Dataset | Crate | Pairs | |---------|-------|-------| | **[Arcwright-Leptos](https://huggingface.co/datasets/y0sif/Arcwright-Leptos)** | Leptos | 2,046 | | **[Arcwright-Axum](https://huggingface.co/datasets/y0sif/Arcwright-Axum)** | Axum | 741 | | **[Arcwright-Rig](https://huggingface.co/datasets/y0sif/Arcwright-Rig)** | Rig | 697 | ## Source All instruction pairs were generated from source code in the [Axum repository](https://github.com/tokio-rs/axum). Code was chunked using tree-sitter into meaningful units (functions, impl blocks, modules), then used as seed material for instruction generation. ## License Apache 2.0
提供机构:
y0sif
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作