five

issdandavis/scbe-codeflow-bijective-v1

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/issdandavis/scbe-codeflow-bijective-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - scbe - sacred-tongues - bijective-coding - multi-language - code-translation - lora - sft size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: data/bijective_codeflow_v1_train.sft.jsonl - split: holdout path: data/bijective_codeflow_v1_holdout.sft.jsonl - config_name: full data_files: - split: train path: data/bijective_codeflow_v1_all.sft.jsonl --- # SCBE Codeflow Bijective v1 Supervised fine-tuning corpus teaching **bijective multi-tongue / multi-language code editing**. Each algorithm is decomposed into N *semantic slots*. Every slot is filled in all 6 Sacred Tongues. An edit at slot k in any tongue maps deterministically to the parallel slot k in every other tongue. Syntactic line counts may differ per tongue; semantic alignment is preserved. ## Splits | Split | Rows | |-------|------| | all | 1040 | | train | 936 | | holdout | 104 | Holdout is `row_index % 10 == 0` (bucket 0), disjoint from train. ## Schema ```jsonc { "messages": [ {"role": "system", "content": "<bijective code-flow instruction>"}, {"role": "user", "content": "<task / source code / edit request>"}, {"role": "assistant", "content": "<canonical bijective output>"} ], "meta": { "task": "translate_one | translate_all | edit_slot_one | edit_slot_all | multiline_edit | identify | align | governance_tag", "algorithm": "<name>", "tongue_src": "KO|AV|RU|CA|UM|DR (where applicable)", "tongue_dst": "KO|AV|RU|CA|UM|DR (where applicable)" } } ``` ## Tongues -> Languages | Code | Name | phi weight | Code Language | |------|------|------------|---------------| | KO | Kor'aelin | 1.00 | Python | | AV | Avali | 1.62 | JavaScript | | RU | Runethic | 2.62 | Rust | | CA | Cassisivadan | 4.24 | Mathematica | | UM | Umbroth | 6.85 | Haskell | | DR | Draumric | 11.09 | Markdown | ## Task Mix | Task | Count | |------|-------| | translate_one | 510 | | identify | 102 | | translate_all | 102 | | align | 102 | | governance_tag | 102 | | edit_slot_one | 60 | | edit_slot_all | 60 | | multiline_edit | 2 | ## Algorithm Library (17) `identity`, `add`, `clamp`, `is_even`, `factorial`, `fibonacci`, `sum_list`, `is_palindrome`, `binary_search`, `max_subarray`, `quicksort`, `two_sum`, `count_words`, `memoize`, `linked_list_append`, `stack`, `queue`. Concept inventory aligns with `benchmark/concepts.json` (the Coder Forge 14 concepts) plus 3 primitives needed for slot warm-up. ## Intended Use Brick2 LoRA on top of `issdandavis/tongue-table-lora-brick1-hf-v1`. Trains the model to: 1. Translate a single algorithm between any two of the 6 tongues. 2. Propagate a slot-level edit from one tongue to all five others without drifting the algorithm's semantics. 3. Identify which algorithm + slot a snippet belongs to. 4. Output per-line phi/d_H governance annotations (Layer 12 wiring). Pair Type #12 ("Multi-lang forge") in the SCBE training pair taxonomy. ## License Apache-2.0. SCBE framework and Sacred Tongues protocol by issdandavis. Prior-art: "The Six Tongues Protocol" (ASIN B0GSSFQD9G).
提供机构:
issdandavis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作