TLA+ Bench
收藏DataCite Commons2026-05-04 更新2026-05-18 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/RYXBRK
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains structured JSON extractions from 206 TLA+ formal specification files drawn from the <a href="https://github.com/tlaplus/Examples">TLA+ Examples repository</a>. Each specification is processed using a multi-stage language model pipeline that identifies and labels syntactic and semantic features of TLA+ modules, including operator definitions, action definitions, temporal properties, fairness conditions, proof constructs, and PlusCal content.
Extraction versions. Each specification includes three JSON files:
<ul> <li><b>Base Extraction</b> from the raw TLA+ source using a language model</li> <li><b>Fine-grained Extraction</b> extends the base version with a four-way classification of operator definitions (StatePredicate, TemporalProperty, Init, ValueDef) and additional subcategories such as function constructors and UNCHANGED expressions</li> <li><b>AST Extraction</b> derived from the SANY parse tree and used for comparison with the language model outputs</li> </ul>
<p>
<b>Schema.</b> Each JSON record contains fields describing module structure, operator and variable names, logical and set operators, temporal and fairness constructs, proof elements, PlusCal content, and a natural language description of the specification. A complete field-by-field codebook is included.
</p><p>
<b>Intended use.</b> This dataset is intended for training and evaluating language models on formal specification tasks, supporting corpus analysis of TLA+ code, and enabling research on automated extraction of semantic information from formal methods artifacts.</p>
提供机构:
Harvard Dataverse
创建时间:
2026-05-03



