TLA+ Bench

Name: TLA+ Bench
Creator: Harvard Dataverse
Published: 2026-05-04 23:04:52
License: 暂无描述

DataCite Commons2026-05-04 更新2026-05-18 收录

下载链接：

https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/RYXBRK

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains structured JSON extractions from 206 TLA+ formal specification files drawn from the <a href="https://github.com/tlaplus/Examples">TLA+ Examples repository</a>. Each specification is processed using a multi-stage language model pipeline that identifies and labels syntactic and semantic features of TLA+ modules, including operator definitions, action definitions, temporal properties, fairness conditions, proof constructs, and PlusCal content. Extraction versions. Each specification includes three JSON files: <ul> <li>Base Extraction from the raw TLA+ source using a language model</li> <li>Fine-grained Extraction extends the base version with a four-way classification of operator definitions (StatePredicate, TemporalProperty, Init, ValueDef) and additional subcategories such as function constructors and UNCHANGED expressions</li> <li>AST Extraction derived from the SANY parse tree and used for comparison with the language model outputs</li> </ul> Schema. Each JSON record contains fields describing module structure, operator and variable names, logical and set operators, temporal and fairness constructs, proof elements, PlusCal content, and a natural language description of the specification. A complete field-by-field codebook is included. Intended use. This dataset is intended for training and evaluating language models on formal specification tasks, supporting corpus analysis of TLA+ code, and enabling research on automated extraction of semantic information from formal methods artifacts.

提供机构：

Harvard Dataverse

创建时间：

2026-05-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集