alliedtoasters/latenet-v0
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/alliedtoasters/latenet-v0
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: statement
dtype: string
- name: label
dtype: bool
- name: negated
dtype: bool
- name: pair_id
dtype: string
- name: generator
dtype: string
- name: domain
dtype: string
- name: relation_type
dtype: string
splits:
- name: train
num_examples: 23724
config_name: default
license: mit
task_categories:
- text-classification
tags:
- truth
- mechanistic-interpretability
- probing
- contrastive
- geometry-of-truth
language:
- en
size_categories:
- 10K<n<100K
---
# LateNet v0
5,931 contrastive true/false statement pairs (23,724 rows) for probing truth representations in LLM activations.
## Overview
Each pair contains 4 rows: true statement, false statement, negated-true, and negated-false. False statements are minimal contrastive edits of the true statement (entity swap, comparison reversal, sibling substitution). Designed as a successor to the [Geometry of Truth](https://arxiv.org/abs/2310.06824) datasets with broader domain coverage and rigorous multi-model validation.
## Generators
| Generator | Pairs | Relation Types |
|-----------|-------|----------------|
| temporal | 1,742 | born_before, occurred_in_century, were_contemporaries |
| language | 1,672 | translates_to, translation_of, word_is_language |
| geography | 1,558 | area_greater, cardinal_direction, closer_to, contained_in, population_greater |
| authorship | 1,476 | author_of, created_by, worked_in_domain |
| mathematics | 1,308 | arithmetic_result, greater_than, has_property, is_divisible_by, more_factors, shares_factor |
| chemistry | 1,306 | atomic_number_greater, in_block, member_of_group, property_greater, state_at_room_temp, symbol_of |
| biology | 1,238 | has_rank, is_member_of, same_taxon |
| anatomy | 1,066 | in_region, in_system, is_structure_type, same_region, same_system |
| astronomy | 271 | closer_to_sun, in_constellation, is_type, orbits, property_greater, star_property |
## Validation
Cascading ensemble: Llama 3.1 405B Instruct (logit-level via NDIF) → Haiku → Sonnet → Opus. Every included affirmative row has validator agreement with the ground-truth label. Disputed and awkward rows are excluded.
## Train/Test Splitting
**Split on `pair_id`, not on rows.** All 4 rows sharing a `pair_id` must land in the same split to prevent leakage.
```python
from sklearn.model_selection import train_test_split
pair_ids = df["pair_id"].unique()
train_ids, test_ids = train_test_split(pair_ids, test_size=0.2, random_state=42)
train = df[df["pair_id"].isin(train_ids)]
test = df[df["pair_id"].isin(test_ids)]
```
## Usage with lmprobe
```python
import pandas as pd
from lmprobe import Probe
df = pd.read_parquet("hf://datasets/alliedtoasters/latenet-v0/latenet_v0.parquet")
aff = df[df["negated"] == False]
true_statements = aff[aff["label"] == True]["statement"].tolist()
false_statements = aff[aff["label"] == False]["statement"].tolist()
probe = Probe(model="Qwen/Qwen2.5-0.5B-Instruct", layers="fast_auto", random_state=42)
probe.fit(true_statements, false_statements)
```
## Citation
If you use this dataset, please cite the Geometry of Truth paper that inspired it:
```bibtex
@article{marks2023geometry,
title={The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Statements},
author={Marks, Samuel and Tegmark, Max},
journal={arXiv preprint arXiv:2310.06824},
year={2023}
}
```
## License
MIT
提供机构:
alliedtoasters



