issdandavis/scbe-codeflow-bijective-v1
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/issdandavis/scbe-codeflow-bijective-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- scbe
- sacred-tongues
- bijective-coding
- multi-language
- code-translation
- lora
- sft
size_categories:
- 1K<n<10K
configs:
- config_name: default
data_files:
- split: train
path: data/bijective_codeflow_v1_train.sft.jsonl
- split: holdout
path: data/bijective_codeflow_v1_holdout.sft.jsonl
- config_name: full
data_files:
- split: train
path: data/bijective_codeflow_v1_all.sft.jsonl
---
# SCBE Codeflow Bijective v1
Supervised fine-tuning corpus teaching **bijective multi-tongue / multi-language
code editing**. Each algorithm is decomposed into N *semantic slots*. Every slot
is filled in all 6 Sacred Tongues. An edit at slot k in any tongue maps
deterministically to the parallel slot k in every other tongue. Syntactic line
counts may differ per tongue; semantic alignment is preserved.
## Splits
| Split | Rows |
|-------|------|
| all | 1040 |
| train | 936 |
| holdout | 104 |
Holdout is `row_index % 10 == 0` (bucket 0), disjoint from train.
## Schema
```jsonc
{
"messages": [
{"role": "system", "content": "<bijective code-flow instruction>"},
{"role": "user", "content": "<task / source code / edit request>"},
{"role": "assistant", "content": "<canonical bijective output>"}
],
"meta": {
"task": "translate_one | translate_all | edit_slot_one | edit_slot_all | multiline_edit | identify | align | governance_tag",
"algorithm": "<name>",
"tongue_src": "KO|AV|RU|CA|UM|DR (where applicable)",
"tongue_dst": "KO|AV|RU|CA|UM|DR (where applicable)"
}
}
```
## Tongues -> Languages
| Code | Name | phi weight | Code Language |
|------|------|------------|---------------|
| KO | Kor'aelin | 1.00 | Python |
| AV | Avali | 1.62 | JavaScript |
| RU | Runethic | 2.62 | Rust |
| CA | Cassisivadan | 4.24 | Mathematica |
| UM | Umbroth | 6.85 | Haskell |
| DR | Draumric | 11.09 | Markdown |
## Task Mix
| Task | Count |
|------|-------|
| translate_one | 510 |
| identify | 102 |
| translate_all | 102 |
| align | 102 |
| governance_tag | 102 |
| edit_slot_one | 60 |
| edit_slot_all | 60 |
| multiline_edit | 2 |
## Algorithm Library (17)
`identity`, `add`, `clamp`, `is_even`, `factorial`, `fibonacci`, `sum_list`,
`is_palindrome`, `binary_search`, `max_subarray`, `quicksort`, `two_sum`,
`count_words`, `memoize`, `linked_list_append`, `stack`, `queue`.
Concept inventory aligns with `benchmark/concepts.json` (the Coder Forge 14
concepts) plus 3 primitives needed for slot warm-up.
## Intended Use
Brick2 LoRA on top of `issdandavis/tongue-table-lora-brick1-hf-v1`. Trains the
model to:
1. Translate a single algorithm between any two of the 6 tongues.
2. Propagate a slot-level edit from one tongue to all five others without
drifting the algorithm's semantics.
3. Identify which algorithm + slot a snippet belongs to.
4. Output per-line phi/d_H governance annotations (Layer 12 wiring).
Pair Type #12 ("Multi-lang forge") in the SCBE training pair taxonomy.
## License
Apache-2.0. SCBE framework and Sacred Tongues protocol by issdandavis.
Prior-art: "The Six Tongues Protocol" (ASIN B0GSSFQD9G).
提供机构:
issdandavis



