Antix5/vi-gym-causal-ascii

Name: Antix5/vi-gym-causal-ascii
Creator: Antix5
Published: 2026-03-07 15:37:36
License: 暂无描述

Hugging Face2026-03-07 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Antix5/vi-gym-causal-ascii

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation - reinforcement-learning language: - en tags: - vi - vim - ascii-art - gym - causal-lm pretty_name: Vi-Gym Causal ASCII Trajectories size_categories: - 100K<n<1M --- # Vi-Gym Causal ASCII Trajectories This dataset contains autoregressive trajectories of a Large Language Model (LLM) agent learning spatial reasoning and geometric drawing within a simulated **Vi (Vim)** editor environment. ## Warning **This dataset is a direct derivation of the source material, it might therefore also contain content not suitable for all audiences. All authors of the original artwork have full ownership.** ## Dataset Structure Each record is a discrete step in the environment, capturing the exact state of the editor before a command is issued. The format is designed for **Causal Next-Token Prediction** training. ### Format Specification ```xml <BOS> <notepad> [CURRENT ASCII CONTENT] </notepad> <mode>[Normal|Insert]</mode> <prompt>[Grammatically Correct Instruction]</prompt> <command> [OPTIMIZED VI KEYSTROKES] ``` ## Technical Provenance 1. **Environment Engine**: States rendered via the Rust-based Vi-Gym engine. 2. **Keystroke Optimization**: Generated using an AST-based compiler prioritizing efficiency via Run-Length Encoding (RLE) and geometric entropy sorting. 3. **Linguistic Robustness**: Prompts utilize grammatically correct indefinite articles (a/an) and randomized natural language templates with human-like noise. ## Source Credits - **Geometric Data**: ASCII art shapes sourced from the [Curated ASCII Art Database](https://github.com/asweigart/asciiartjsondb) (originally from [asciiart.eu](https://www.asciiart.eu/)). - **Logic Backend**: Editor state and command interpretation powered by the [ViLM](https://github.com/Antix5/ViLM) core engine (not published yet). ## Training Recommendations Calculate loss **only** on tokens following the `<command>\n` tag to focus the model on the mapping between visual state and command execution.

提供机构：

Antix5

5,000+

优质数据集

54 个

任务类型

进入经典数据集