five

TLA+ Bench

收藏
DataCite Commons2026-05-04 更新2026-05-18 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/RYXBRK
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains structured JSON extractions from 206 TLA+ formal specification files drawn from the <a href="https://github.com/tlaplus/Examples">TLA+ Examples repository</a>. Each specification is processed using a multi-stage language model pipeline that identifies and labels syntactic and semantic features of TLA+ modules, including operator definitions, action definitions, temporal properties, fairness conditions, proof constructs, and PlusCal content. Extraction versions. Each specification includes three JSON files: <ul> <li><b>Base Extraction</b> from the raw TLA+ source using a language model</li> <li><b>Fine-grained Extraction</b> extends the base version with a four-way classification of operator definitions (StatePredicate, TemporalProperty, Init, ValueDef) and additional subcategories such as function constructors and UNCHANGED expressions</li> <li><b>AST Extraction</b> derived from the SANY parse tree and used for comparison with the language model outputs</li> </ul> <p> <b>Schema.</b> Each JSON record contains fields describing module structure, operator and variable names, logical and set operators, temporal and fairness constructs, proof elements, PlusCal content, and a natural language description of the specification. A complete field-by-field codebook is included. </p><p> <b>Intended use.</b> This dataset is intended for training and evaluating language models on formal specification tasks, supporting corpus analysis of TLA+ code, and enabling research on automated extraction of semantic information from formal methods artifacts.</p>
提供机构:
Harvard Dataverse
创建时间:
2026-05-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作