Bassgawd/orangejuce-plugin-ai
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Bassgawd/orangejuce-plugin-ai
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotations
language_creators:
- found
language:
- en
license:
- mit
multilingual:
- false
pretty_name: OrangeJuce Plugin AI Dataset
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text-generation
task_ids:
- text2text-generation
---
# OrangeJuce Plugin AI Dataset
Training dataset for building AI models that generate professional-grade audio plugins in C++ using the JUCE framework.
## Dataset Summary
This dataset was built to train a code generation model capable of producing production-ready audio plugins across all major plugin formats (VST2, VST3, AU, AAX). It combines **31,684 entries** across 34 knowledge tables covering the full stack of audio plugin development: DSP theory, C++ systems programming, real-time audio constraints, plugin architecture, and UI/UX design.
## Dataset Structure
The dataset contains **15,076 training examples** formatted as Alpaca instruction-tuning pairs in JSONL format.
### Format
Each entry follows the Alpaca instruction format:
### Coverage
| Category | Examples | Description |
|---|---|---|
| Plugin Generation | 2,000 | Full JUCE plugin implementations |
| DSP / C++ | 3,624 | Filters, synths, effects, algorithms |
| Error Correction | 1,008 | Bug -> fix pairs in C++ audio code |
| Multi-Format | 1,003 | Python <-> C++ equivalents for 12 formats |
| Analog Modeling | 200 | Circuit schematics mapped to DSP |
| Realtime Audio | 1,040 | Thread safety, lock-free, SIMD |
| Formula / Math | 550 | Z-transforms, transfer functions, coefficients |
| Performance | 500 | CPU/memory benchmarks for audio algorithms |
| UI/UX Patterns | 1,506 | Plugin interface design patterns |
| NLP Intents | 7,472 | Natural language -> plugin specs |
| Other | 1,673 | Tutorials, benchmarks, lock-free structures, format docs |
## Training
### Recommended Model
- **Base**: Qwen2.5-7B-Instruct (quantized 4-bit via Unsloth)
- **Alternative**: CodeLlama-7b-Instruct for code-focused output
### Unsloth Training Config
### Hyperparameters
| Parameter | Value | Notes |
|---|---|---|
| Model | Qwen2.5-7B-Instruct 4-bit | Unsloth BNB quantization |
| Sequence Length | 4096 | Handles long C++ code blocks |
| Batch Size | 2 | Per-device |
| Gradient Accumulation | 4 | Effective batch: 8 |
| Learning Rate | 2e-4 | Cosine schedule |
| LoRA R | 16 | Rank decomposition |
| LoRA Alpha | 16 | Scaling factor |
| Epochs | 3 | ~9K steps |
| Optimizer | adamw_8bit | Memory efficient |
## Quality Metrics
- All C++ code syntax-validated with g++ against a JUCE stub header
- real_world_examples: 100%% syntax-pass rate (1,521 entries)
- Code entries use authentic JUCE APIs: AudioBuffer, dsp::ProcessorChain, AudioParameterFloat, etc.
## Citation
## License
MIT License
提供机构:
Bassgawd



