latkes/inside-out-replication-canary-v1
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/latkes/inside-out-replication-canary-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- inside-out-replication
- canary
- factual-knowledge
---
# inside-out-replication-canary-v1
Canary run: 5 questions per relation, 50 samples, Llama-3-8B. Full pipeline E2E test.
## Dataset Info
- **Rows**: 482
- **Columns**: 11
## Columns
| Column | Type | Description |
|--------|------|-------------|
| relation | Value('string') | Wikidata relation (P26=spouse, P264=label, P176=manufacturer, P50=author) |
| question_id | Value('string') | Unique question identifier |
| question | Value('string') | Entity-centric question text |
| gold_answer | Value('string') | Ground truth answer from Wikidata |
| answer | Value('string') | Model-generated answer (greedy or sampled) |
| label | Value('string') | Judge verdict: CORRECT or INCORRECT |
| log_p_a_q | Value('float64') | Log probability P(answer|question) — unnormalized |
| log_p_norm_a_q | Value('float64') | Length-normalized log probability P_norm(answer|question) |
| p_true_v0 | Value('float64') | Restricted softmax P(True) using A/B verification prompt (V0) |
| verif_v1_score | Value('float64') | Restricted softmax P(True) using True/False prompt (V1) |
| verif_v2_score | Value('float64') | Restricted softmax P(True) using Yes/No prompt (V2) |
## Generation Parameters
```json
{
"script_name": "run_canary.py",
"model": "meta-llama/Meta-Llama-3-8B-Instruct",
"description": "Canary run: 5 questions per relation, 50 samples, Llama-3-8B. Full pipeline E2E test.",
"experiment_name": "inside-out-replication",
"job_id": "mll:26166",
"cluster": "mll",
"artifact_status": "final",
"canary": true,
"hyperparameters": {
"n_questions": 5,
"n_samples": 50,
"temperature": 1.0,
"judge_model": "Qwen/Qwen2.5-14B-Instruct"
},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("latkes/inside-out-replication-canary-v1", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
提供机构:
latkes



