five

latkes/inside-out-replication-canary-v1

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/latkes/inside-out-replication-canary-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - inside-out-replication - canary - factual-knowledge --- # inside-out-replication-canary-v1 Canary run: 5 questions per relation, 50 samples, Llama-3-8B. Full pipeline E2E test. ## Dataset Info - **Rows**: 482 - **Columns**: 11 ## Columns | Column | Type | Description | |--------|------|-------------| | relation | Value('string') | Wikidata relation (P26=spouse, P264=label, P176=manufacturer, P50=author) | | question_id | Value('string') | Unique question identifier | | question | Value('string') | Entity-centric question text | | gold_answer | Value('string') | Ground truth answer from Wikidata | | answer | Value('string') | Model-generated answer (greedy or sampled) | | label | Value('string') | Judge verdict: CORRECT or INCORRECT | | log_p_a_q | Value('float64') | Log probability P(answer|question) — unnormalized | | log_p_norm_a_q | Value('float64') | Length-normalized log probability P_norm(answer|question) | | p_true_v0 | Value('float64') | Restricted softmax P(True) using A/B verification prompt (V0) | | verif_v1_score | Value('float64') | Restricted softmax P(True) using True/False prompt (V1) | | verif_v2_score | Value('float64') | Restricted softmax P(True) using Yes/No prompt (V2) | ## Generation Parameters ```json { "script_name": "run_canary.py", "model": "meta-llama/Meta-Llama-3-8B-Instruct", "description": "Canary run: 5 questions per relation, 50 samples, Llama-3-8B. Full pipeline E2E test.", "experiment_name": "inside-out-replication", "job_id": "mll:26166", "cluster": "mll", "artifact_status": "final", "canary": true, "hyperparameters": { "n_questions": 5, "n_samples": 50, "temperature": 1.0, "judge_model": "Qwen/Qwen2.5-14B-Instruct" }, "input_datasets": [] } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("latkes/inside-out-replication-canary-v1", split="train") print(f"Loaded {len(dataset)} rows") ``` ---
提供机构:
latkes
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作