y0sif/Arcwright-Rig

Name: y0sif/Arcwright-Rig
Creator: y0sif
Published: 2026-04-09 07:15:47
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/y0sif/Arcwright-Rig

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: - en tags: - rust - code - instruction-tuning - rig - chatml size_categories: - 1K<n<10K task_categories: - text-generation --- # Arcwright-Rig An instruction-tuning dataset for the **[Rig](https://github.com/0xPlaygrounds/rig)** Rust crate, built for the [Arcwright](https://huggingface.co/y0sif/arcwright-E4B-v1) fine-tuned model. Rig is a Rust library for building LLM-powered applications with support for multiple providers, tool use, embeddings, and RAG pipelines. ## Dataset Summary - **697 instruction-response pairs** covering ai agents, tool use / function calling, embeddings, vector stores, rag pipelines, and multi-provider support (openai, anthropic) - Generated from real source code using the **OSS-Instruct** methodology via Claude Code sub-agents - Validated for structural correctness and deduplicated using MinHash (Jaccard threshold 0.7) - Format: **ChatML** (messages array with system/user/assistant roles) ## Category Distribution | Category | Count | % | |----------|-------|---| | Code Generation | 236 | 33% | | Code Explanation | 113 | 16% | | Api Usage | 113 | 16% | | Bug Detection | 77 | 11% | | Refactoring | 95 | 13% | | Test Generation | 63 | 9% | ## Format Each example is a JSON object with a `messages` array: ```json { "messages": [ {"role": "system", "content": "You are an expert Rust programmer specializing in the rig crate and modern Rust development patterns."}, {"role": "user", "content": "Show how to create a basic AI agent with Rig that uses OpenAI as the provider and can respond to user messages."}, {"role": "assistant", "content": "..."} ], "category": "code_generation", "crate": "rig" } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("y0sif/Arcwright-Rig") print(dataset["train"][0]["messages"]) ``` ## Part of Arcwright This dataset is one of three crate-specific datasets used to train [Arcwright-E4B-v1](https://huggingface.co/y0sif/arcwright-E4B-v1): | Dataset | Crate | Pairs | |---------|-------|-------| | **[Arcwright-Leptos](https://huggingface.co/datasets/y0sif/Arcwright-Leptos)** | Leptos | 2,046 | | **[Arcwright-Axum](https://huggingface.co/datasets/y0sif/Arcwright-Axum)** | Axum | 741 | | **[Arcwright-Rig](https://huggingface.co/datasets/y0sif/Arcwright-Rig)** | Rig | 697 | ## Source All instruction pairs were generated from source code in the [Rig repository](https://github.com/0xPlaygrounds/rig). Code was chunked using tree-sitter into meaningful units (functions, impl blocks, modules), then used as seed material for instruction generation. ## License Apache 2.0

提供机构：

y0sif

5,000+

优质数据集

54 个

任务类型

进入经典数据集