LLMParseBench

Name: LLMParseBench
Creator: Sönke Tenckhoff
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/llmparsebench

下载链接

链接失效反馈

官方服务：

资源简介：

The LLMStructExtractBench dataset provides a standardized, open benchmark for evaluating large language models (LLMs) on the task of extracting structured information from natural-language emails into valid JSON objects. The dataset consists of five common administrative scenarios: equipment loan requests, IT support tickets, project extension requests, sick leave requests, and conference or training registrations. These scenarios reflect typical workflows found in academic, corporate, and public institutions and were selected for their diversity in structure, key\/value variety, and semantic complexity.For each scenario, the dataset contains:Natural-language emails written in a realistic business tone with varied linguistic forms, multilingual naming conventions, and diverse phrasing patterns.Ground-truth JSON objects that represent the precise, expected extraction targets. Each JSON object strictly adheres to the corresponding schema.Scenario-specific JSON Schema files, which define the required structure, types, and fields expected in any valid output.Prompt templates that describe extraction instructions, provide examples, or define value constraints, enabling reproducible prompting practices within evaluation pipelines.Validation-ready formatting, meaning each ground-truth pair is automatically verified against its schema.The dataset is fully synthetic but carefully modeled after realistic internal communication patterns. Variations include different date formats, numbered or bulleted equipment lists, implicit or explicit identifiers, and shifting sentence structure.LLMStructExtractBench is intended to support:Benchmarking structured extraction performance in controlled, schema-based settings.Evaluating prompt strategies for enforcing structural and semantic constraints.Developing and testing JSON parsing and validation pipelines in LLM-powered systems.Investigating robustness to paraphrasing, lexical variation, and diverse email styles.This dataset serves as a foundation for reproducible experimentation in JSON-structured information extraction and complements existing research on LLM reliability in constrained output generation.

提供机构：

Sönke Tenckhoff

5,000+

优质数据集

54 个

任务类型

进入经典数据集