needle-threading
收藏Needle Threading 数据集概述
数据集内容
- 数据集名称: Needle Threading
- 数据来源: HuggingFace datasets
- 数据访问方式:
- Option 1: 使用 HuggingFace datasets 库加载数据
- Option 2: 手动下载数据
数据结构
- 数据文件:
Single_Needle.jsonMultiple_Needles.jsonConditional_Needles.jsonSingle_Threads.jsonMulti_Threads.json
- 数据字段:
id: 问题IDhaystack: 包含键值对的JSON对象keys: 键values: 值question: 问题描述context_length: 上下文长度(字符数)num_kv_pairs: 键值对数量repeat_number: 重复次数needle_depth: 针的深度num_needles: 针的数量needle_placement: 针的放置方式conditional_character: 条件字符thread_length: 线程长度thread_direction: 线程方向num_threads: 线程数量
数据示例
python from datasets import load_dataset
single_needle_dataset = load_dataset("jonathan-roberts1/needle-threading", split=Single_Needle)
single_needle_dataset[5]
输出示例: json { "id": 5, "haystack": "{"e3e70682-c209-4cac-629f-6fbed82c07cd": "f728b4fa-4248-5e3a-0a5d-2f346baa9455", "eb1...": "964a870c-7c87-9b74-1d87-8f9f9cdf5a86"}", "keys": "247a8333-f7b0-b7d2-cda8-056c3d15eef7", "values": "1759edc3-72ae-2244-8b01-63c1cd9d2b7d", "question": "Extract the value corresponding to the specified key in the JSON object. Key: "247a83...-cda8-056c3d15eef7" Corresponding value: ", "context_length": 2000, "num_kv_pairs": 25, "repeat_number": 0, "needle_depth": "50", "num_needles": 1, "needle_placement": "depthwise", "conditional_character": "N/A", "thread_length": 1, "thread_direction": "N/A", "num_threads": 0 }
数据集用途
- **评估大型语言模型(LLMs)**的能力,特别是它们在长上下文窗口中跟踪信息线程的能力。
- 实验数据用于评估17个领先的LLMs在不同任务中的表现。
数据集下载
-
HuggingFace datasets: python from datasets import load_dataset single_needle_dataset = load_dataset("jonathan-roberts1/needle-threading", split=Single_Needle)
-
手动下载: bash cd data wget https://huggingface.co/datasets/jonathan-roberts1/needle-threading/resolve/main/data_json.zip unzip data_json.zip && rm data_json.zip
数据集评估
-
示例代码: 使用Gemini 1.5 Flash模型进行单针任务的推理。 python from datasets import load_dataset import vertexai from vertexai.preview.generative_models import GenerativeModel from tqdm import tqdm import pandas as pd from ast import literal_eval
project_id = "YOUR_PROJECT_ID" region = "REGION" # eg "us-central1" model_name = "gemini-1.5-flash-preview-0514"
dataset = load_dataset("jonathan-roberts1/needle-threading", split="Single_Needle")
output_df = pd.DataFrame(columns=["Question_ID", "Output", "Answer", "Correct?"])
vertexai.init(project=project_id, location=region) generative_multimodal_model = GenerativeModel(model_name) config = { "max_output_tokens": 100, "temperature": 0, "top_k": 1 }
for idx, item in tqdm(enumerate(dataset)): if idx == 10: break question = item[question] haystack = item[haystack] prompt = haystack +
- question model_response = generative_multimodal_model.generate_content(contents=prompt, generation_config=config).text model_response.strip() model_response = literal_eval(model_response) answer = item[values] model_answer = model_response[0:len(str(answer))] correct = model_answer == str(answer) results_row = {"Question_ID": item[id], "Output": model_response, "Answer": answer, "Correct?": correct} output_df = pd.concat([output_df, pd.DataFrame([results_row])], ignore_index=True)
accuracy = output_df["Correct?"].mean() print(f"{model_name} Single_Needle: {100 * accuracy:.2f}%")




