KRLabsOrg/tool-output-extraction-swebench-gliner2-v2

Name: KRLabsOrg/tool-output-extraction-swebench-gliner2-v2
Creator: KRLabsOrg
Published: 2026-04-24 17:53:25
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/KRLabsOrg/tool-output-extraction-swebench-gliner2-v2

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个实验性变体，基于KRLabsOrg/tool-output-extraction-swebench数据集，专门为GLiNER2的架构限制进行了重新设计。GLiNER2的最大实体长度为8个标记，这与父数据集中的多行证据块不兼容。因此，该数据集提取了行号前缀作为短实体，同时保留了完整的上下文块。数据集主要用于信息提取和命名实体识别（NER）任务，适用于GLiNER2的训练和推理。数据集包含训练、开发和测试三个分割，其中训练分割设计用于快速迭代。数据集格式为JSON，每个实例包含输入、输出和元数据。数据集的来源是KRLabsOrg/tool-output-extraction-swebench，包含11,477个示例和27种工具类型。

Experimental variant of KRLabsOrg/tool-output-extraction-swebench, reformulated for GLiNER2s architectural constraints. GLiNER2 has max_width=8 (max entity length = 8 tokens), which is incompatible with the multi-line evidence blocks in the parent dataset (50–500 tokens). This variant extracts line-number prefixes as short entities instead, keeping the full chunk as context. The dataset is designed for information extraction and named entity recognition (NER) tasks, suitable for training and inference with GLiNER2. It includes train, dev, and test splits, with the training split designed for fast iteration. The dataset format is JSON, with each instance containing input, output, and metadata. The source of the dataset is KRLabsOrg/tool-output-extraction-swebench, with 11,477 examples and 27 tool types.

提供机构：

KRLabsOrg

5,000+

优质数据集

54 个

任务类型

进入经典数据集