KRLabsOrg/tool-output-extraction-swebench-gliner

Name: KRLabsOrg/tool-output-extraction-swebench-gliner
Creator: KRLabsOrg
Published: 2026-04-17 12:12:50
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/KRLabsOrg/tool-output-extraction-swebench-gliner

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - token-classification language: - en tags: - information-extraction - ner - gliner - extractive-qa - coding-agents - tool-output - context-pruning size_categories: - 10K<n<100K --- # Tool Output Extraction (extractive / GLiNER2 format) Extractive variant of [KRLabsOrg/tool-output-extraction-swebench](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench), formatted for fine-tuning span-extraction models ([GLiNER2](https://github.com/fastino-ai/GLiNER2), BERT-for-QA, etc.). Each tool observation from the parent dataset is chunked into ~400-token windows (preserving line boundaries) so it fits into encoder-style models with a 512-token context. The query is concatenated in front of each chunk, extractive-QA style, and gold evidence is mapped to verbatim spans within the tool-output portion. Chunks with no evidence become natural negative examples. ## Format Each row is a GLiNER2 training record with the query concatenated into the input: ```json { "input": "Query: Find the code block...\n\nTool output:\n193: ...\n194: ...\n...\n233: columns = []\n...", "output": { "entities": {"RELEVANT": ["233: columns = []\n234: for col in data.columns:\n..."]} }, "meta": { "instance_id": "astropy__astropy-12544", "source": "swe", "tool_type": "read_file", "query": "Find the code block in read_table_fits...", "chunk_index": 7, "total_chunks": 15, "has_evidence": true, "chunk_start_line": 193, "chunk_end_line": 233 } } ``` Design choices: - **Query concatenation.** Query is prepended as `Query: ...\n\nTool output:\n<chunk>` so the model conditions on it directly, like an extractive-QA model. This avoids relying on GLiNER2's per-type `entity_descriptions` for per-example queries. - **Single entity type `RELEVANT`.** All examples share the same type; the task-specific signal comes from the query in the input. - **Verbatim spans.** Every entity mention is a verbatim substring of `input`, validated with GLiNER2's `InputExample.validate()`. ## Splits | Split | Chunks | Positive | Negative | Source examples | |-------|-------:|---------:|---------:|----------------:| | train | 51,917 | 17,450 | 34,467 | 10,508 | | dev | 2,579 | 422 | 2,157 | 240 | | test | 9,595 | 1,090 | 8,505 | 618 | Negatives in the train split are subsampled (30% kept) to limit class imbalance. Dev and test preserve the natural distribution. ## Usage with GLiNER2 ```python from gliner2.training.data import InputExample, TrainingDataset import json def load_split(path): examples = [] with open(path) as f: for line in f: d = json.loads(line) examples.append(InputExample( text=d["input"], entities=d["output"]["entities"], )) return TrainingDataset(examples=examples) train_ds = load_split("gliner_train.jsonl") dev_ds = load_split("gliner_dev.jsonl") ``` At inference, format new inputs the same way: ```python query = "Find the failing test block" chunk = open("pytest_output.txt").read() text = f"Query: {query}\n\nTool output:\n{chunk}" # model.extract_entities(text, entity_types=["RELEVANT"]) -> list of verbatim spans ``` ## Source Generated from [KRLabsOrg/tool-output-extraction-swebench](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench) (11,477 examples, 27 tool types, derived from SWE-bench repositories and synthetic multi-ecosystem observations). See the [paper](https://arxiv.org/abs/2604.04979) for construction details. ## Citation ```bibtex @misc{kovács2026squeeztaskconditionedtooloutputpruning, title={Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents}, author={Ádám Kovács}, year={2026}, eprint={2604.04979}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/2604.04979}, } ``` ## License Apache 2.0

提供机构：

KRLabsOrg

5,000+

优质数据集

54 个

任务类型

进入经典数据集