Bassgawd/orangejuce-plugin-ai

Name: Bassgawd/orangejuce-plugin-ai
Creator: Bassgawd
Published: 2026-04-09 14:13:26
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Bassgawd/orangejuce-plugin-ai

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - no-annotations language_creators: - found language: - en license: - mit multilingual: - false pretty_name: OrangeJuce Plugin AI Dataset size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-generation task_ids: - text2text-generation --- # OrangeJuce Plugin AI Dataset Training dataset for building AI models that generate professional-grade audio plugins in C++ using the JUCE framework. ## Dataset Summary This dataset was built to train a code generation model capable of producing production-ready audio plugins across all major plugin formats (VST2, VST3, AU, AAX). It combines **31,684 entries** across 34 knowledge tables covering the full stack of audio plugin development: DSP theory, C++ systems programming, real-time audio constraints, plugin architecture, and UI/UX design. ## Dataset Structure The dataset contains **15,076 training examples** formatted as Alpaca instruction-tuning pairs in JSONL format. ### Format Each entry follows the Alpaca instruction format: ### Coverage | Category | Examples | Description | |---|---|---| | Plugin Generation | 2,000 | Full JUCE plugin implementations | | DSP / C++ | 3,624 | Filters, synths, effects, algorithms | | Error Correction | 1,008 | Bug -> fix pairs in C++ audio code | | Multi-Format | 1,003 | Python <-> C++ equivalents for 12 formats | | Analog Modeling | 200 | Circuit schematics mapped to DSP | | Realtime Audio | 1,040 | Thread safety, lock-free, SIMD | | Formula / Math | 550 | Z-transforms, transfer functions, coefficients | | Performance | 500 | CPU/memory benchmarks for audio algorithms | | UI/UX Patterns | 1,506 | Plugin interface design patterns | | NLP Intents | 7,472 | Natural language -> plugin specs | | Other | 1,673 | Tutorials, benchmarks, lock-free structures, format docs | ## Training ### Recommended Model - **Base**: Qwen2.5-7B-Instruct (quantized 4-bit via Unsloth) - **Alternative**: CodeLlama-7b-Instruct for code-focused output ### Unsloth Training Config ### Hyperparameters | Parameter | Value | Notes | |---|---|---| | Model | Qwen2.5-7B-Instruct 4-bit | Unsloth BNB quantization | | Sequence Length | 4096 | Handles long C++ code blocks | | Batch Size | 2 | Per-device | | Gradient Accumulation | 4 | Effective batch: 8 | | Learning Rate | 2e-4 | Cosine schedule | | LoRA R | 16 | Rank decomposition | | LoRA Alpha | 16 | Scaling factor | | Epochs | 3 | ~9K steps | | Optimizer | adamw_8bit | Memory efficient | ## Quality Metrics - All C++ code syntax-validated with g++ against a JUCE stub header - real_world_examples: 100%% syntax-pass rate (1,521 entries) - Code entries use authentic JUCE APIs: AudioBuffer, dsp::ProcessorChain, AudioParameterFloat, etc. ## Citation ## License MIT License

提供机构：

Bassgawd

5,000+

优质数据集

54 个

任务类型

进入经典数据集