y0sif/Arcwright-Axum
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/y0sif/Arcwright-Axum
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
tags:
- rust
- code
- instruction-tuning
- axum
- chatml
size_categories:
- 1K<n<10K
task_categories:
- text-generation
---
# Arcwright-Axum
An instruction-tuning dataset for the **[Axum](https://github.com/tokio-rs/axum)** Rust crate, built for the [Arcwright](https://huggingface.co/y0sif/arcwright-E4B-v1) fine-tuned model.
Axum is a web application framework built on top of Tokio and Tower, focused on ergonomics and modularity.
## Dataset Summary
- **741 instruction-response pairs** covering http routing, middleware, extractors (path, query, json, state), error handling, tower layers, and websocket support
- Generated from real source code using the **OSS-Instruct** methodology via Claude Code sub-agents
- Validated for structural correctness and deduplicated using MinHash (Jaccard threshold 0.7)
- Format: **ChatML** (messages array with system/user/assistant roles)
## Category Distribution
| Category | Count | % |
|----------|-------|---|
| Code Generation | 233 | 31% |
| Code Explanation | 141 | 19% |
| Api Usage | 138 | 18% |
| Bug Detection | 72 | 9% |
| Refactoring | 83 | 11% |
| Test Generation | 74 | 9% |
## Format
Each example is a JSON object with a `messages` array:
```json
{
"messages": [
{"role": "system", "content": "You are an expert Rust programmer specializing in the axum crate and modern Rust development patterns."},
{"role": "user", "content": "Write an Axum web server with a POST /users endpoint that accepts JSON input, validates it, and returns a created user with a generated ID."},
{"role": "assistant", "content": "..."}
],
"category": "code_generation",
"crate": "axum"
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("y0sif/Arcwright-Axum")
print(dataset["train"][0]["messages"])
```
## Part of Arcwright
This dataset is one of three crate-specific datasets used to train [Arcwright-E4B-v1](https://huggingface.co/y0sif/arcwright-E4B-v1):
| Dataset | Crate | Pairs |
|---------|-------|-------|
| **[Arcwright-Leptos](https://huggingface.co/datasets/y0sif/Arcwright-Leptos)** | Leptos | 2,046 |
| **[Arcwright-Axum](https://huggingface.co/datasets/y0sif/Arcwright-Axum)** | Axum | 741 |
| **[Arcwright-Rig](https://huggingface.co/datasets/y0sif/Arcwright-Rig)** | Rig | 697 |
## Source
All instruction pairs were generated from source code in the [Axum repository](https://github.com/tokio-rs/axum). Code was chunked using tree-sitter into meaningful units (functions, impl blocks, modules), then used as seed material for instruction generation.
## License
Apache 2.0
提供机构:
y0sif



