cds-jb/synthweb-qwen3.5-9b-multiscale-inference

Name: cds-jb/synthweb-qwen3.5-9b-multiscale-inference
Creator: cds-jb
Published: 2026-05-21 20:18:31
License: 暂无描述

Hugging Face2026-05-21 更新2026-05-31 收录

下载链接：

https://hf-mirror.com/datasets/cds-jb/synthweb-qwen3.5-9b-multiscale-inference

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: doc dtype: string - name: scope dtype: string - name: verbalizer_prompt dtype: string - name: context dtype: string - name: atom_text dtype: string - name: target_response dtype: string - name: incorrect_plausible_answer dtype: string - name: bb_answer_rollout_answers list: string - name: bb_answer_rollout_scores list: float64 - name: bb_answer_score_mean dtype: float64 - name: bb_answer_score_max dtype: float64 - name: bb_answer_score_min dtype: float64 - name: row_seed dtype: int64 - name: doc_id dtype: string - name: doc_source dtype: string - name: doc_idx dtype: int64 - name: split_char_offset dtype: int64 - name: row_splits struct: - name: word_lens dtype: int64 - name: sentence dtype: int64 - name: paragraph dtype: int64 - name: whole dtype: int64 - name: doc_choice_rationale dtype: string - name: split_rationale dtype: string - name: slot_idx dtype: float64 - name: target dtype: string - name: typicality dtype: string - name: abs_index dtype: float64 - name: signed_index dtype: float64 - name: signed_index_hint dtype: float64 - name: actual_word_distance_from_split dtype: float64 - name: synergy_check dtype: string - name: distribution_check dtype: string - name: difficulty_check dtype: string - name: source dtype: string - name: n_tokens_actual dtype: float64 - name: tokenizer dtype: string - name: generator_model dtype: string - name: source_model dtype: string splits: - name: train num_bytes: 6150303735 num_examples: 590741 download_size: 979308451 dataset_size: 6150303735 --- # synthweb-qwen3.5-9b-multiscale-inference Probing-question dataset built over [`cds-jb/synthweb-qwen3.5-9b`](https://huggingface.co/datasets/cds-jb/synthweb-qwen3.5-9b) (Qwen3.5-9B) continuations of FineWeb prefixes. Each row is ONE probe testing whether content on one side of a character-level split in a prefix+continuation can be recovered from the **latent hidden state** of the source LM at that split. This dataset is the evaluation harness for **method M** — an "activation oracle" that decodes hidden-state content into natural language. ## Why this dataset Activation-oracle methods claim to recover information that is encoded in a language model's hidden state but NOT yet manifest in the surface text. To measure that capability we need probes that satisfy two constraints: 1. **HARD-FROM-TEXT** — a careful reader of the side OPPOSITE to the target should NOT be able to confidently produce the answer from the complement text alone (no verbatim quote, no clean paraphrase). 2. **EASY-FROM-LATENT** — the answer should be exactly the kind of content the source LM has already COMMITTED TO in its hidden state at the split (the next thematic move it was primed for; the specific named entity it has been tracking; the conclusion it was building toward). Probes that fail (1) measure nothing M adds beyond a competent text reader. Probes that fail (2) measure entropy, not oracle skill. The **`bb_answer_score_max`** column in this dataset is precisely the (1) test — a low value flags a good HARD-FROM-TEXT probe. For HARD-FROM-LATENT, each spw probe carries a `target_response` field — the answer a "Big Brother" (BB) monitor would give if it had access to **both** the complement context AND `atom_text` (i.e. the full-info ground truth, generated by Haiku in one joint call). The contrast between `target_response` (full-info answer) and the **blind** `bb_answer_*` columns (complement-only answer) operationalizes the gap M is supposed to close. ## Probe design — five scope groups Each source doc (FineWeb prefix + sibling Qwen3.5-9B continuations) yields up to **16 probes** spanning five scope groups: | scope | n in this build | atom (what's being recovered) | scoring path | |---|---:|---|---| | word | 184,959 | a single content word | local (exact-match / token-logprob) | | lens | 184,805 | contiguous N tokens past/future the split | local (Qwen tokenizer) | | sentence | 110,882 | one sentence | Haiku answerer + judge | | paragraph | 73,932 | one paragraph | Haiku answerer + judge | | whole | 35,622 | one half-doc (≥ all paragraphs on one side) | Haiku answerer + judge | **Total probes in this build: 590,741** across 36,997 distinct source docs (3 round(s)). Per-doc slot allocation is **5 word + 5 lens + 3 sentence + 2 paragraph + 1 whole = 16 slots/row**. Word and lens are Python-templated (no LLM); sentence / paragraph / whole are written by Claude Haiku 4.5 in extended-thinking mode (see "Generator" below). ### Per-scope splits Each row carries **four** char-level split offsets — one per scope group: - `word_lens` split → applies to the 10 word + lens probes - `sentence` split → applies to the 3 sentence probes - `paragraph` split → applies to the 2 paragraph probes - `whole` split → applies to the 1 whole probe Haiku picks all four split points in a single response. It may use the same offset for all four (≈51% of rows in the smoke), or up to four distinct offsets when different scopes need different boundaries (≈49% of rows). The per-probe `split_char_offset` field gives the offset that applies to THAT probe. ### Targets and typicality For each probe Haiku picks a `target ∈ {prefix, suffix}`: - **prefix-target** → M sees the suffix; must POST-dict an earlier-in-doc atom. - **suffix-target** → M sees the prefix; must PRE-dict a later-in-doc atom. For suffix-target slots, Haiku additionally labels `typicality ∈ {typical, atypical}` against the cross-sibling distribution: - `typical` → the specific claim/event/move recurs in at least 3 other sibling continuations at a comparable position. - `atypical` → the claim is essentially unique to the chosen sibling and absent from the rest. The `distribution_check` field records Haiku's justification. ### Synergy constraint (per scope) Each multi-token atom (sentence / paragraph / whole) must require **synthesizing information across the entire atom**, not a strict sub-region: > *sentence* → must integrate the main clause with its qualifying > clauses / numbers / objects. > *paragraph* → must combine at least two of the paragraph's sentences. > *whole* → must integrate across multiple paragraphs/sections (an > arc, a thesis-evidence chain, a tension between stated stances). The `synergy_check` field per probe names which sub-pieces must be integrated. ### Incorrect plausible answer Every probe (incl. word & lens) carries an `incorrect_plausible_answer` (IPA) — a hypothetical alternative answer a careful reader could PLAUSIBLY produce from the complement side, that happens to be wrong. This enables contrastive scoring (`logP(atom)` vs `logP(IPA)` under M). The IPA must match the atom's shape (single word for word; same-length sentence for sentence; etc.) and must NOT be what any sibling actually produced. ## Generator **Model**: Claude Haiku 4.5 (`claude-haiku-4-5-20251001`) in extended-thinking mode (8K thinking-budget, 28K max output tokens), submitted via Anthropic Message Batches API for ~50% cost reduction. **What Haiku sees per row**: the FineWeb prefix + up to 3 sibling Qwen3.5-9B continuations of that prefix (sampled from the source rollouts dataset). The siblings give Haiku a sense of the source LM's distribution at this point, which it uses to label typicality and to craft IPAs that don't accidentally match a real sibling. **Prompt scaffold** (lightly abridged from `scripts/oracle_question_prompt.py`): > You design probing questions for evaluating a method M. > > What M does: M is given (i) the PREFIX of a document and (ii) the > LATENT HIDDEN-STATE of the source LM at the boundary between prefix > and suffix — the same source LM that generated the suffix. M's > capability is to use that hidden state to read off content the source > LM was about to emit (for SUFFIX-target probes) or had committed to > earlier in the document (for PREFIX-target probes). > > The goal of each probe: construct a question whose answer is > HARD-FROM-TEXT but EASY-FROM-LATENT. > > [...] > > Self-check before finalizing each probe: mentally try to answer your > verbalizer using ONLY the complement-side text. If you can produce > atom_text (or a clean paraphrase) from that alone, the probe is too > easy on text — pick a different atom or rewrite the verbalizer to > require inference about what the source LM was INTERNALLY tracking, > not what is already surface-visible. **Calibration**: rounds were tuned on a 100-doc calibration sweep with addendum levels {-3..+3} sweeping HARD-FROM-TEXT strictness; the build settled on **level -2** ("probes have been too hard. Pick atoms that share an obvious topical / narrative thread with the complement so a text-only reader has solid footing, while still leaving the SPECIFIC content of the atom for M to recover"). This produced a mean answerer-max score of ~0.50 with substantial bimodality (see below). **JSON robustness**: ~1.3% of Haiku responses had small JSON quirks (unescaped quotes inside long string fields, raw newlines). These are recovered with [`json-repair`](https://pypi.org/project/json-repair/), lifting end-to-end finalize rate from ~86% (strict parsing only) to 98.7%. ## Answerer + judge (scoring pass) For sentence / paragraph / whole probes, this dataset additionally carries 5 stochastic answerer rollouts (Haiku 4.5, temperature=1.0, max_tokens=800) and a per-rollout judge score from another Haiku 4.5 call (0.0–1.0). The answerer sees ONLY the complement side of the split (up to 4000 chars) plus the verbalizer, and is asked to produce a concise answer. The judge sees the verbalizer, the ground-truth `atom_text`, and the model answer, and emits a single float score: | score | meaning | |---|---| | 0.0 | totally wrong / unrelated | | 0.5 | partially correct or correct theme but wrong specifics | | 1.0 | essentially the same content as ground truth | The **max** across 5 rollouts is the headline filter signal: if even the best of 5 stochastic attempts can't recover the atom from the complement, the probe is HARD-FROM-TEXT — exactly what you want to keep for evaluating M. ### Max-score histogram (this build) ``` [0.0, 0.2) 33,518 ( 17.7%) ######## HARD [0.2, 0.4) 45,525 ( 24.1%) ############ [0.4, 0.6) 25,803 ( 13.7%) ###### valley [0.6, 0.8) 52,087 ( 27.6%) ############# [0.8, 1.0] 31,961 ( 16.9%) ######## EASY ``` mean=0.461 std=0.296 · HARD (max<0.4) 41.8% · EASY (max≥0.8) 16.9% The clear central valley around [0.4, 0.6) is the filtering signal: keep the HARD tail for M-evaluation; the EASY tail is text-derivable and serves as a sanity check. ### Cost & methodology of the answerer pass - spw probes scored: **188,819** - answerer calls: 5 × 188,819 = 944,095 - judge calls: 5 × 188,819 = 944,095 - estimated tokens: ~1227.3M input + ~75.5M output (answerer) + ~377.6M input + ~4720.0K output (judge) - batch-discounted Haiku 4.5 ($0.50/M input · $2.50/M output): **~$1003** - wall time: ~3–4 hours on Anthropic's batch queue (2 answerer chunks ≤256 MB each, plus 2 judge chunks ≤100,000 requests each — the Anthropic Message Batches API has a per-batch hard cap of 100K requests). If you want to re-score with more rollouts (denoising the per-probe max-score estimate), reuse `scripts/score_probes.py --model qwen3.5-9b --n-rollouts K` — the cost is linear in K. ## Schema (per row) Each row is one probe. Key fields: | column | type | meaning | |---|---|---| | `doc_id` | str | FineWeb-derived id | | `doc_source` | str | original FineWeb URL | | `doc` | str | the chosen sibling continuation (prefix + suffix) | | `doc_idx` | int | which sibling Haiku committed to | | `split_char_offset` | int | character split for this probe's scope group | | `row_splits` | dict | all four scope-group splits for the row | | `slot_idx` | int | 0..15 within the row | | `scope` | str | word / lens / sentence / paragraph / whole | | `target` | str | prefix or suffix | | `typicality` | str/null | typical / atypical (suffix only) | | `abs_index` | int | absolute atom index | | `signed_index` | int | signed atom index (negative = prefix, positive = suffix) | | `atom_text` | str/null | text of the target atom | | `verbalizer_prompt` | str/null | the probing question | | `incorrect_plausible_answer` | str/null | one hypothetical wrong-but-plausible alternative | | `synergy_check` | str/null | which sub-pieces must be integrated (multi-token scopes) | | `distribution_check` | str/null | relation to cross-sibling distribution (suffix slots) | | `difficulty_check` | str/null | the per-probe HARD-FROM-TEXT / EASY-FROM-LATENT argument | | `source` | str | python_template or haiku | | `n_tokens_actual` | int/null | actual token count for lens probes | | `tokenizer` | str/null | tokenizer used (Qwen3-8B) for lens probes | | `generator_model` | str | always `claude-haiku-4-5-20251001` | | `source_model` | str | `qwen3.5-9b` | | `bb_answer_rollout_answers` | list[str]/null | 5 model answers (spw probes only) | | `bb_answer_rollout_scores` | list[float]/null | 5 judge scores | | `bb_answer_score_mean` | float/null | mean of the 5 scores | | `bb_answer_score_max` | float/null | **max — primary filter signal** | | `bb_answer_score_min` | float/null | min | | `target_response` | str | full-info correct answer (Haiku sees context + atom_text); for word/lens probes this equals `atom_text` | | `context` | str | complement-side raw text (the side opposite to atom_text relative to the scope's split) | ## Reproducibility & incremental growth The dataset is built in **rounds**. Each round samples N **new** doc-ids — disjoint from prior rounds — using a deterministic seed. The local manifest at `data_pipelines/multiscale_inference/qwen3.5-9b/manifest.json` records `(round_idx, batch_ids, doc_ids, seeds, submitted_at)` per round, so a follow-up extension run will pick up where this one left off without overlap. Scripts (in [github repo TBD]): - `scripts/submit_multiscale_inference.py` — submit a round of generator batches - `scripts/poll_multiscale_inference.py` — fetch + finalize + push probes - `scripts/score_probes.py` — score with answerer + judge - `scripts/oracle_question_prompt.py` — the procedural prompt builder ## Source rollouts - [`cds-jb/synthweb-qwen3.5-9b`](https://huggingface.co/datasets/cds-jb/synthweb-qwen3.5-9b) — Qwen3.5-9B continuations of FineWeb prefixes (mode-collapse-filtered and detached-tail-truncated). - See that dataset's README for the rollout filter pipeline. ## License Same as upstream FineWeb (ODC-By 1.0) for the prefix text; generated continuations and probes are released under the same license for research use.

提供机构：

cds-jb

5,000+

优质数据集

54 个

任务类型

进入经典数据集