latkes/inside-out-replication-results-v1
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/latkes/inside-out-replication-results-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- inside-out-replication
- factual-knowledge
- hidden-states
- probe
---
# inside-out-replication-results-v1
Full Inside-Out replication: 3 models x 4 relations x 450 test questions x 1000 samples. Includes P(a|q), P_norm, P(True) V0/V1/V2, and probe scores.
## Dataset Info
- **Rows**: 1523595
- **Columns**: 20
## Columns
| Column | Type | Description |
|--------|------|-------------|
| question_id | Value('string') | Unique question identifier |
| answer | Value('string') | Model-generated answer |
| label | Value('string') | Judge verdict: CORRECT or INCORRECT |
| log_p_a_q | Value('float64') | Log probability P(answer|question) — unnormalized |
| p_a_q | Value('float64') | *No description provided* |
| log_p_norm_a_q | Value('float64') | Length-normalized log probability |
| p_norm_a_q | Value('float64') | *No description provided* |
| p_true | Value('float64') | Restricted softmax P(True) V0 (A/B verification) |
| p_true_full_a | Value('float64') | *No description provided* |
| p_true_full_b | Value('float64') | *No description provided* |
| p_true_residual | Value('float64') | *No description provided* |
| verif_v0_ab_score | Value('float64') | P(True) V0 A/B |
| verif_v0_ab_residual | Value('float64') | *No description provided* |
| verif_v1_truefalse_score | Value('float64') | P(True) V1 True/False |
| verif_v1_truefalse_residual | Value('float64') | *No description provided* |
| verif_v2_yesno_score | Value('float64') | P(True) V2 Yes/No |
| verif_v2_yesno_residual | Value('float64') | *No description provided* |
| model | Value('string') | Model short name (llama3-8b, mistral-7b, gemma2-9b) |
| relation | Value('string') | Wikidata relation (P26=spouse, P264=label, P176=manufacturer, P50=author) |
| probe_score | Value('float64') | Probe P(correct) from best-layer hidden state logistic regression |
## Generation Parameters
```json
{
"script_name": "full pipeline (02-09)",
"model": "Llama-3-8B, Mistral-7B-v0.3, Gemma-2-9B",
"description": "Full Inside-Out replication: 3 models x 4 relations x 450 test questions x 1000 samples. Includes P(a|q), P_norm, P(True) V0/V1/V2, and probe scores.",
"experiment_name": "inside-out-replication",
"cluster": "mll",
"artifact_status": "final",
"canary": false,
"hyperparameters": {
"n_samples": 1000,
"temperature": 1.0,
"judge_model": "Qwen/Qwen2.5-14B-Instruct",
"max_tokens": 64
},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("latkes/inside-out-replication-results-v1", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
提供机构:
latkes



