five

vishnuOI/unity-dev-instructions

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/vishnuOI/unity-dev-instructions
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en tags: - unity - unity3d - game-development - csharp - xr - vr - ar - openxr - instruction-tuning - code - gamedev task_categories: - text-generation - question-answering pretty_name: Unity Developer Instructions size_categories: - 10K<n<100K --- # Unity Developer Instructions A comprehensive instruction-tuning dataset for Unity game development, covering C# scripting, XR/VR development, physics, animation, rendering, UI Toolkit, and performance optimization. ## Dataset Summary | Split | Count | |-------|------:| | Train | 46,483 | | Test | 2,446 | | **Total** | **48,929** | ## Data Sources | unity_docs | 40,496 | | stackoverflow | 6,071 | | github | 2,362 | Source breakdown: | Source | Count | |--------|------:| | unity_docs | 40,496 | | stackoverflow | 6,071 | | github | 2,362 | ## Category Distribution | Category | Count | |----------|------:| | scripting | 18,732 | | rendering | 9,980 | | editor | 4,963 | | physics | 3,540 | | math | 2,824 | | ui | 2,142 | | xr | 1,868 | | animation | 1,694 | | input | 1,147 | | performance | 884 | | audio | 740 | | networking | 415 | ## Schema Each row is a JSON object with the following fields: ```json { "id": "so_12345", "source": "stackoverflow", "category": "physics", "system": "You are an expert Unity game developer...", "instruction": "How do I detect collision between two objects?", "response": "Use OnCollisionEnter..." } ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `id` | string | Unique identifier with source prefix | | `source` | string | Origin: `stackoverflow`, `unity_docs`, `hf_ibranze_v2`, `github` | | `category` | string | Topic category (see distribution above) | | `system` | string | System prompt for the assistant | | `instruction` | string | The question or task | | `response` | string | The answer or solution | ## Sources ### 1. Stack Overflow [unity3d] Fetched via the Stack Exchange API v2.3. Filtered to questions with score ≥ 2 that have an accepted answer. HTML formatting stripped and converted to Markdown. Questions and accepted answers form instruction/response pairs. ### 2. ibranze/codellama_unity3d_v2 High-quality, human-curated Unity Q&A pairs from the HuggingFace Hub. Direct download from `ibranze/codellama_unity3d_v2`. ### 3. Unity Scripting API Documentation Scraped from `docs.unity3d.com/ScriptReference/`. Each class page generates: - One overview pair (class description + example) - One pair per member (property/method descriptions) ### 4. GitHub Unity C# Repositories (if available) Permissively-licensed Unity C# scripts extracted from MIT/Apache repos via the GitHub API, formatted as code generation tasks. ## License This dataset is released under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). **Source attribution:** - Stack Overflow content is licensed under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) — see Stack Exchange Terms of Service. - Unity documentation is © Unity Technologies — scraped for research/educational purposes. - GitHub code is from repositories with permissive licenses (MIT/Apache 2.0). - `ibranze/codellama_unity3d_v2` is redistributed under its original license. ## Usage ### Load with 🤗 Datasets ```python from datasets import load_dataset ds = load_dataset("vishnuOI/unity-dev-instructions") # Access splits train = ds["train"] test = ds["test"] # Example row print(train[0]) # { # "id": "so_12345", # "source": "stackoverflow", # "category": "physics", # "system": "You are an expert Unity game developer...", # "instruction": "How do I detect collision...", # "response": "Use OnCollisionEnter..." # } ``` ### Fine-tune with TRL SFTTrainer ```python from datasets import load_dataset from transformers import AutoTokenizer, AutoModelForCausalLM from trl import SFTTrainer, SFTConfig dataset = load_dataset("vishnuOI/unity-dev-instructions") def format_prompt(row): return ( f"<|system|>\n{row['system']}\n" f"<|user|>\n{row['instruction']}\n" f"<|assistant|>\n{row['response']}" ) model_name = "codellama/CodeLlama-7b-Instruct-hf" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True) trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset["train"], args=SFTConfig( output_dir="./unity-llm", max_seq_length=2048, num_train_epochs=3, per_device_train_batch_size=4, gradient_accumulation_steps=4, learning_rate=2e-4, ), formatting_func=format_prompt, ) trainer.train() ``` ### Filter by category or source ```python xr_data = dataset["train"].filter(lambda x: x["category"] == "xr") so_data = dataset["train"].filter(lambda x: x["source"] == "stackoverflow") ``` ## Citation ```bibtex @dataset{oneimmersive_unity_dev_instructions_2024, title = {Unity Developer Instructions}, author = {OneImmersive}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/vishnuOI/unity-dev-instructions}, license = {CC-BY-4.0}, note = {Instruction-tuning dataset for Unity game development} } ``` ## Dataset Construction Built with the open-source pipeline at: [github.com/oneimmersive/unity-dataset-pipeline](https://github.com/oneimmersive/unity-dataset-pipeline) Pipeline scripts: 1. `01_fetch_stackoverflow.py` — Stack Exchange API crawler 2. `02_fetch_huggingface.py` — HuggingFace dataset downloader 3. `03_fetch_unity_docs.py` — Unity ScriptReference scraper 4. `04_build_dataset.py` — Normalisation, deduplication, quality filtering 5. `05_upload_huggingface.py` — HuggingFace Hub uploader
提供机构:
vishnuOI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作