y0sif/Arcwright-Leptos
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/y0sif/Arcwright-Leptos
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
tags:
- rust
- code
- instruction-tuning
- leptos
- chatml
size_categories:
- 1K<n<10K
task_categories:
- text-generation
---
# Arcwright-Leptos
An instruction-tuning dataset for the **[Leptos](https://github.com/leptos-rs/leptos)** Rust crate, built for the [Arcwright](https://huggingface.co/y0sif/arcwright-E4B-v1) fine-tuned model.
Leptos is a reactive web UI framework for Rust with fine-grained reactivity, server-side rendering, and a component model inspired by modern frontend frameworks.
## Dataset Summary
- **2,034 instruction-response pairs** covering reactive signals, components, `view!` macro, server functions, ssr, hydration, routing, and resource management
- Generated from real source code using the **OSS-Instruct** methodology via Claude Code sub-agents
- Validated for structural correctness and deduplicated using MinHash (Jaccard threshold 0.7)
- Format: **ChatML** (messages array with system/user/assistant roles)
## Category Distribution
| Category | Count | % |
|----------|-------|---|
| Code Generation | 711 | 34% |
| Code Explanation | 364 | 17% |
| Api Usage | 320 | 15% |
| Bug Detection | 242 | 11% |
| Refactoring | 211 | 10% |
| Test Generation | 198 | 9% |
## Format
Each example is a JSON object with a `messages` array:
```json
{
"messages": [
{"role": "system", "content": "You are an expert Rust programmer specializing in the leptos crate and modern Rust development patterns."},
{"role": "user", "content": "Write a Leptos component that displays a counter with increment and decrement buttons using reactive signals."},
{"role": "assistant", "content": "..."}
],
"category": "code_generation",
"crate": "leptos"
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("y0sif/Arcwright-Leptos")
print(dataset["train"][0]["messages"])
```
## Part of Arcwright
This dataset is one of three crate-specific datasets used to train [Arcwright-E4B-v1](https://huggingface.co/y0sif/arcwright-E4B-v1):
| Dataset | Crate | Pairs |
|---------|-------|-------|
| **[Arcwright-Leptos](https://huggingface.co/datasets/y0sif/Arcwright-Leptos)** | Leptos | 2,046 |
| **[Arcwright-Axum](https://huggingface.co/datasets/y0sif/Arcwright-Axum)** | Axum | 741 |
| **[Arcwright-Rig](https://huggingface.co/datasets/y0sif/Arcwright-Rig)** | Rig | 697 |
## Source
All instruction pairs were generated from source code in the [Leptos repository](https://github.com/leptos-rs/leptos). Code was chunked using tree-sitter into meaningful units (functions, impl blocks, modules), then used as seed material for instruction generation.
## License
Apache 2.0
许可证:Apache-2.0
语言:英语
标签:Rust、代码、指令微调(instruction-tuning)、Leptos、ChatML
规模类别:1K < n < 10K
任务类别:文本生成
# Arcwright-Leptos
本数据集专为**Leptos** Rust库打造的指令微调数据集,用于训练[Arcwright](https://huggingface.co/y0sif/arcwright-E4B-v1)微调模型。
Leptos是一款面向Rust的响应式Web UI框架,具备细粒度响应式能力、服务端渲染(Server-Side Rendering, SSR)特性,且采用借鉴现代前端框架的组件模型。
## 数据集概览
- **2034条指令-响应对**,覆盖响应式信号(reactive signals)、组件、`view!`宏、服务端函数、SSR、水合(Hydration)、路由(routing)与资源管理(resource management)
- 通过Claude Code智能体基于OSS-Instruct方法从真实源代码生成
- 采用雅卡尔(Jaccard)阈值为0.7的MinHash算法进行结构正确性验证与去重
- 格式:**ChatML**(包含system/user/assistant角色的消息数组)
## 类别分布
| 类别 | 数量 | 占比 |
|----------|-------|---|
| 代码生成 | 711 | 34% |
| 代码解释 | 364 | 17% |
| API使用 | 320 | 15% |
| 漏洞检测 | 242 | 11% |
| 代码重构 | 211 | 10% |
| 测试用例生成 | 198 | 9% |
## 数据格式
每条示例为包含`messages`数组的JSON对象:
json
{
"messages": [
{"role": "system", "content": "你是一名专注于Leptos库与现代Rust开发范式的资深Rust程序员。"},
{"role": "user", "content": "编写一个使用响应式信号实现递增、递减按钮的Leptos计数器组件。"},
{"role": "assistant", "content": "..."}
],
"category": "code_generation",
"crate": "leptos"
}
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("y0sif/Arcwright-Leptos")
print(dataset["train"][0]["messages"])
## 隶属于Arcwright项目
本数据集是用于训练[Arcwright-E4B-v1](https://huggingface.co/y0sif/arcwright-E4B-v1)的三个特定库数据集之一:
| 数据集 | 关联库 | 指令-响应对数量 |
|---------|-------|-------|
| **Arcwright-Leptos** | Leptos | 2046 |
| **Arcwright-Axum** | Axum | 741 |
| **Arcwright-Rig** | Rig | 697 |
## 数据来源
所有指令对均源自[Leptos官方仓库](https://github.com/leptos-rs/leptos)的源代码:先通过Tree-sitter将代码分块为函数、实现块、模块等有意义的单元,再将其作为种子素材生成指令对。
## 许可证
Apache 2.0
提供机构:
y0sif



