BarryFutureman/AgentTraj-L-latent-states-Qwen2-5-0-5B-Instruct
收藏Hugging Face2025-11-29 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/BarryFutureman/AgentTraj-L-latent-states-Qwen2-5-0-5B-Instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: context
dtype: string
- name: traj_idx
dtype: int64
- name: turn_idx
dtype: int64
- name: value_target
dtype: float64
- name: latent_vector
sequence: float64
splits:
- name: train
num_bytes: 2977853176
num_examples: 255364
download_size: 1288240333
dataset_size: 2977853176
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
**See build_data.py for swapping models and build your own.**
Example usage:
```python
import numpy as np
import torch
from datasets import load_dataset
import faiss
def load_latent_states(model_name="Qwen2-5-0-5B-Instruct", cache_dir="./cache"):
repo_name = f"BarryFutureman/AgentTraj-L-latent-states-{model_name}"
dataset = load_dataset(repo_name, split="train", cache_dir=cache_dir)
return dataset
def build_index_from_dataset(dataset):
vectors = np.array(dataset["latent_vector"], dtype=np.float32)
dim = vectors.shape[1]
index = faiss.IndexFlatIP(dim)
index.add(vectors)
return index
def retrieve_by_index(query_idx, dataset, index, top_k=5):
query_vec = np.array([dataset[query_idx]["latent_vector"]], dtype=np.float32)
D, I = index.search(query_vec, top_k)
results = []
for j, i in enumerate(I[0]):
results.append({
"state_idx": int(i),
"traj_idx": dataset[int(i)]["traj_idx"],
"turn_idx": dataset[int(i)]["turn_idx"],
"similarity": float(D[0][j]),
"context": dataset[int(i)]["context"],
})
return results
if __name__ == "__main__":
# Load pre-computed latent states
print("Loading pre-computed latent states...")
dataset = load_latent_states()
print(f"Loaded {len(dataset)} states")
# Build FAISS index
print("Building FAISS index...")
index = build_index_from_dataset(dataset)
query_idx = 69
print(f"\n{'='*60}")
print(f"Retrieving similar states for index {query_idx}")
print(f"{'='*60}")
results = retrieve_by_index(query_idx, dataset, index, top_k=5)
for rank, r in enumerate(results):
marker = " <-- QUERY" if r["state_idx"] == query_idx else ""
print(f"\nRank {rank + 1}: Traj {r['traj_idx']}, Turn {r['turn_idx']} "
f"(similarity: {r['similarity']:.4f}){marker}")
print(f" Preview: {r['context'][:-200]}...")
```
提供机构:
BarryFutureman



