five

BarryFutureman/AgentTraj-L-latent-states-Qwen2-5-0-5B-Instruct

收藏
Hugging Face2025-11-29 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/BarryFutureman/AgentTraj-L-latent-states-Qwen2-5-0-5B-Instruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: context dtype: string - name: traj_idx dtype: int64 - name: turn_idx dtype: int64 - name: value_target dtype: float64 - name: latent_vector sequence: float64 splits: - name: train num_bytes: 2977853176 num_examples: 255364 download_size: 1288240333 dataset_size: 2977853176 configs: - config_name: default data_files: - split: train path: data/train-* --- **See build_data.py for swapping models and build your own.** Example usage: ```python import numpy as np import torch from datasets import load_dataset import faiss def load_latent_states(model_name="Qwen2-5-0-5B-Instruct", cache_dir="./cache"): repo_name = f"BarryFutureman/AgentTraj-L-latent-states-{model_name}" dataset = load_dataset(repo_name, split="train", cache_dir=cache_dir) return dataset def build_index_from_dataset(dataset): vectors = np.array(dataset["latent_vector"], dtype=np.float32) dim = vectors.shape[1] index = faiss.IndexFlatIP(dim) index.add(vectors) return index def retrieve_by_index(query_idx, dataset, index, top_k=5): query_vec = np.array([dataset[query_idx]["latent_vector"]], dtype=np.float32) D, I = index.search(query_vec, top_k) results = [] for j, i in enumerate(I[0]): results.append({ "state_idx": int(i), "traj_idx": dataset[int(i)]["traj_idx"], "turn_idx": dataset[int(i)]["turn_idx"], "similarity": float(D[0][j]), "context": dataset[int(i)]["context"], }) return results if __name__ == "__main__": # Load pre-computed latent states print("Loading pre-computed latent states...") dataset = load_latent_states() print(f"Loaded {len(dataset)} states") # Build FAISS index print("Building FAISS index...") index = build_index_from_dataset(dataset) query_idx = 69 print(f"\n{'='*60}") print(f"Retrieving similar states for index {query_idx}") print(f"{'='*60}") results = retrieve_by_index(query_idx, dataset, index, top_k=5) for rank, r in enumerate(results): marker = " <-- QUERY" if r["state_idx"] == query_idx else "" print(f"\nRank {rank + 1}: Traj {r['traj_idx']}, Turn {r['turn_idx']} " f"(similarity: {r['similarity']:.4f}){marker}") print(f" Preview: {r['context'][:-200]}...") ```
提供机构:
BarryFutureman
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作