five

Antix5/vi-gym-causal-ascii

收藏
Hugging Face2026-03-07 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Antix5/vi-gym-causal-ascii
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation - reinforcement-learning language: - en tags: - vi - vim - ascii-art - gym - causal-lm pretty_name: Vi-Gym Causal ASCII Trajectories size_categories: - 100K<n<1M --- # Vi-Gym Causal ASCII Trajectories This dataset contains autoregressive trajectories of a Large Language Model (LLM) agent learning spatial reasoning and geometric drawing within a simulated **Vi (Vim)** editor environment. ## Warning **This dataset is a direct derivation of the source material, it might therefore also contain content not suitable for all audiences. All authors of the original artwork have full ownership.** ## Dataset Structure Each record is a discrete step in the environment, capturing the exact state of the editor before a command is issued. The format is designed for **Causal Next-Token Prediction** training. ### Format Specification ```xml <BOS> <notepad> [CURRENT ASCII CONTENT] </notepad> <mode>[Normal|Insert]</mode> <prompt>[Grammatically Correct Instruction]</prompt> <command> [OPTIMIZED VI KEYSTROKES] ``` ## Technical Provenance 1. **Environment Engine**: States rendered via the Rust-based Vi-Gym engine. 2. **Keystroke Optimization**: Generated using an AST-based compiler prioritizing efficiency via Run-Length Encoding (RLE) and geometric entropy sorting. 3. **Linguistic Robustness**: Prompts utilize grammatically correct indefinite articles (a/an) and randomized natural language templates with human-like noise. ## Source Credits - **Geometric Data**: ASCII art shapes sourced from the [Curated ASCII Art Database](https://github.com/asweigart/asciiartjsondb) (originally from [asciiart.eu](https://www.asciiart.eu/)). - **Logic Backend**: Editor state and command interpretation powered by the [ViLM](https://github.com/Antix5/ViLM) core engine (not published yet). ## Training Recommendations Calculate loss **only** on tokens following the `<command>\n` tag to focus the model on the mapping between visual state and command execution.
提供机构:
Antix5
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作