piercemaloney/coqgym_coq_projects_v2
收藏Hugging Face2024-03-21 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/piercemaloney/coqgym_coq_projects_v2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
splits:
- name: finmap
num_bytes: 745110
num_examples: 3
- name: GeometricAlgebra
num_bytes: 2180457
num_examples: 8
- name: bdds
num_bytes: 11537326
num_examples: 15
- name: concat
num_bytes: 1876052
num_examples: 90
- name: topology
num_bytes: 1998914
num_examples: 32
- name: euler_formula
num_bytes: 2157257
num_examples: 3
- name: ruler_compass_geometry
num_bytes: 531800
num_examples: 39
- name: fcsl_pcm
num_bytes: 1911756
num_examples: 12
- name: twoSquare
num_bytes: 219009
num_examples: 2
- name: zfc
num_bytes: 605621
num_examples: 11
- name: shuffle
num_bytes: 87431
num_examples: 6
- name: metalib
num_bytes: 190046
num_examples: 11
- name: hardware
num_bytes: 185840
num_examples: 25
- name: three_gap
num_bytes: 315458
num_examples: 7
- name: coq_ext_lib
num_bytes: 311782
num_examples: 47
- name: cheerios
num_bytes: 140210
num_examples: 6
- name: regexp
num_bytes: 116907
num_examples: 7
- name: coq_library_undecidability
num_bytes: 3089099
num_examples: 93
- name: automata
num_bytes: 1075636
num_examples: 25
- name: coquelicot
num_bytes: 6616788
num_examples: 23
- name: izf
num_bytes: 146423
num_examples: 8
- name: lemma_overloading
num_bytes: 1075228
num_examples: 27
- name: lin_alg
num_bytes: 6760336
num_examples: 68
- name: railroad_crossing
num_bytes: 202543
num_examples: 1
- name: idxassoc
num_bytes: 139339
num_examples: 2
- name: hoare_tut
num_bytes: 39069
num_examples: 3
- name: lesniewski_mereology
num_bytes: 123636
num_examples: 2
- name: verdi
num_bytes: 12479953
num_examples: 28
- name: additions
num_bytes: 147048
num_examples: 20
- name: checker
num_bytes: 6689
num_examples: 2
- name: VST
num_bytes: 38787573
num_examples: 292
- name: domain_theory
num_bytes: 67545
num_examples: 4
- name: propcalc
num_bytes: 54422
num_examples: 5
- name: circuits
num_bytes: 399265
num_examples: 19
- name: CompCert
num_bytes: 14680669
num_examples: 142
- name: area_method
num_bytes: 1578799
num_examples: 38
- name: bbv
num_bytes: 705745
num_examples: 13
- name: ails
num_bytes: 521926
num_examples: 11
- name: dep_map
num_bytes: 24951
num_examples: 2
- name: ChargeCore
num_bytes: 273913
num_examples: 20
- name: markov
num_bytes: 102887
num_examples: 1
- name: rsa
num_bytes: 180155
num_examples: 5
- name: verdi_raft
num_bytes: 21320585
num_examples: 107
- name: goedel
num_bytes: 9110441
num_examples: 44
- name: bigenough
num_bytes: 5031
num_examples: 1
- name: generic_environments
num_bytes: 142529
num_examples: 2
- name: disel
num_bytes: 4787193
num_examples: 37
- name: ctltctl
num_bytes: 61038
num_examples: 3
- name: coq_list_string
num_bytes: 6409
num_examples: 3
- name: QuickChick
num_bytes: 718709
num_examples: 17
- name: schroeder
num_bytes: 24958
num_examples: 4
- name: lazy_pcf
num_bytes: 497224
num_examples: 13
- name: weak_up_to
num_bytes: 175365
num_examples: 10
- name: groups
num_bytes: 11115
num_examples: 1
- name: pocklington
num_bytes: 617483
num_examples: 13
- name: mini_compiler
num_bytes: 14459
num_examples: 1
- name: StructTact
num_bytes: 293887
num_examples: 17
- name: exceptions
num_bytes: 8962
num_examples: 1
- name: coqrel
num_bytes: 196290
num_examples: 12
- name: higman_s
num_bytes: 198793
num_examples: 5
- name: bellantonicook
num_bytes: 3852234
num_examples: 16
- name: rem
num_bytes: 14504
num_examples: 1
- name: tree_automata
num_bytes: 1972575
num_examples: 17
- name: coq_procrastination
num_bytes: 26763
num_examples: 1
- name: higman_cf
num_bytes: 49720
num_examples: 2
- name: GeoCoq
num_bytes: 12090435
num_examples: 328
- name: coqoban
num_bytes: 38285
num_examples: 1
- name: search_trees
num_bytes: 59332
num_examples: 5
- name: system
num_bytes: 2053
num_examples: 1
- name: ieee754
num_bytes: 35416
num_examples: 3
- name: jordan_curve_theorem
num_bytes: 12530912
num_examples: 10
- name: huffman
num_bytes: 1634130
num_examples: 25
- name: zf
num_bytes: 933118
num_examples: 10
- name: hedges
num_bytes: 360884
num_examples: 1
- name: zorns_lemma
num_bytes: 608752
num_examples: 19
- name: tortoise_hare_algorithm
num_bytes: 10084
num_examples: 1
- name: mod_red
num_bytes: 636981
num_examples: 5
- name: UnifySL
num_bytes: 1716822
num_examples: 128
- name: traversable_fincontainer
num_bytes: 429019
num_examples: 1
- name: buchberger
num_bytes: 2422607
num_examples: 29
- name: constructive_geometry
num_bytes: 80179
num_examples: 7
- name: tarski_geometry
num_bytes: 112419
num_examples: 8
- name: int_map
num_bytes: 817835
num_examples: 13
- name: float
num_bytes: 2994074
num_examples: 31
- name: InfSeqExt
num_bytes: 106656
num_examples: 5
- name: zchinese
num_bytes: 62626
num_examples: 6
- name: smc
num_bytes: 6045540
num_examples: 15
- name: pts
num_bytes: 72482
num_examples: 8
- name: param_pi
num_bytes: 2596347
num_examples: 11
- name: axiomatic_abp
num_bytes: 1204713
num_examples: 7
- name: lambda
num_bytes: 181085
num_examples: 10
- name: maths
num_bytes: 37685
num_examples: 3
- name: quicksort_complexity
num_bytes: 489164
num_examples: 28
- name: fssec_model
num_bytes: 1569135
num_examples: 25
- name: ipc
num_bytes: 3108901
num_examples: 31
- name: chinese
num_bytes: 208365
num_examples: 13
- name: cours_de_coq
num_bytes: 71295
num_examples: 11
- name: graphs
num_bytes: 644609
num_examples: 2
- name: dictionaries
num_bytes: 67746
num_examples: 1
- name: dblib
num_bytes: 195344
num_examples: 6
- name: cecoa
num_bytes: 2449356
num_examples: 14
- name: corespec
num_bytes: 1222339
num_examples: 28
- name: free_groups
num_bytes: 63973
num_examples: 1
- name: ramsey
num_bytes: 11734
num_examples: 1
- name: qarith
num_bytes: 51068
num_examples: 2
- name: math_comp
num_bytes: 44708646
num_examples: 76
- name: amm11262
num_bytes: 365079
num_examples: 5
- name: angles
num_bytes: 322579
num_examples: 5
- name: orb_stab
num_bytes: 204783
num_examples: 1
- name: qarith_stern_brocot
num_bytes: 10873352
num_examples: 35
- name: Categories
num_bytes: 4945
num_examples: 1
- name: group_theory
num_bytes: 104940
num_examples: 10
- name: demos
num_bytes: 62208
num_examples: 5
- name: distributed_reference_counting
num_bytes: 8888400
num_examples: 74
- name: subst
num_bytes: 362195
num_examples: 17
- name: miniml
num_bytes: 114099
num_examples: 1
- name: algebra
num_bytes: 3275753
num_examples: 65
- name: fermat4
num_bytes: 172156
num_examples: 5
- name: otway_rees
num_bytes: 226052
num_examples: 19
- name: SCEV_coq
num_bytes: 89902
num_examples: 1
- name: PolTac
num_bytes: 157370
num_examples: 13
- name: fundamental_arithmetics
num_bytes: 308733
num_examples: 8
download_size: 37361889
dataset_size: 286711572
configs:
- config_name: default
data_files:
- split: finmap
path: data/finmap-*
- split: GeometricAlgebra
path: data/GeometricAlgebra-*
- split: bdds
path: data/bdds-*
- split: concat
path: data/concat-*
- split: topology
path: data/topology-*
- split: euler_formula
path: data/euler_formula-*
- split: ruler_compass_geometry
path: data/ruler_compass_geometry-*
- split: fcsl_pcm
path: data/fcsl_pcm-*
- split: twoSquare
path: data/twoSquare-*
- split: zfc
path: data/zfc-*
- split: shuffle
path: data/shuffle-*
- split: metalib
path: data/metalib-*
- split: hardware
path: data/hardware-*
- split: three_gap
path: data/three_gap-*
- split: coq_ext_lib
path: data/coq_ext_lib-*
- split: cheerios
path: data/cheerios-*
- split: regexp
path: data/regexp-*
- split: coq_library_undecidability
path: data/coq_library_undecidability-*
- split: automata
path: data/automata-*
- split: coquelicot
path: data/coquelicot-*
- split: izf
path: data/izf-*
- split: lemma_overloading
path: data/lemma_overloading-*
- split: lin_alg
path: data/lin_alg-*
- split: railroad_crossing
path: data/railroad_crossing-*
- split: idxassoc
path: data/idxassoc-*
- split: hoare_tut
path: data/hoare_tut-*
- split: lesniewski_mereology
path: data/lesniewski_mereology-*
- split: verdi
path: data/verdi-*
- split: additions
path: data/additions-*
- split: checker
path: data/checker-*
- split: VST
path: data/VST-*
- split: domain_theory
path: data/domain_theory-*
- split: propcalc
path: data/propcalc-*
- split: circuits
path: data/circuits-*
- split: CompCert
path: data/CompCert-*
- split: area_method
path: data/area_method-*
- split: bbv
path: data/bbv-*
- split: ails
path: data/ails-*
- split: dep_map
path: data/dep_map-*
- split: ChargeCore
path: data/ChargeCore-*
- split: markov
path: data/markov-*
- split: rsa
path: data/rsa-*
- split: verdi_raft
path: data/verdi_raft-*
- split: goedel
path: data/goedel-*
- split: bigenough
path: data/bigenough-*
- split: generic_environments
path: data/generic_environments-*
- split: disel
path: data/disel-*
- split: ctltctl
path: data/ctltctl-*
- split: coq_list_string
path: data/coq_list_string-*
- split: QuickChick
path: data/QuickChick-*
- split: schroeder
path: data/schroeder-*
- split: lazy_pcf
path: data/lazy_pcf-*
- split: weak_up_to
path: data/weak_up_to-*
- split: groups
path: data/groups-*
- split: pocklington
path: data/pocklington-*
- split: mini_compiler
path: data/mini_compiler-*
- split: StructTact
path: data/StructTact-*
- split: exceptions
path: data/exceptions-*
- split: coqrel
path: data/coqrel-*
- split: higman_s
path: data/higman_s-*
- split: bellantonicook
path: data/bellantonicook-*
- split: rem
path: data/rem-*
- split: tree_automata
path: data/tree_automata-*
- split: coq_procrastination
path: data/coq_procrastination-*
- split: higman_cf
path: data/higman_cf-*
- split: GeoCoq
path: data/GeoCoq-*
- split: coqoban
path: data/coqoban-*
- split: search_trees
path: data/search_trees-*
- split: system
path: data/system-*
- split: ieee754
path: data/ieee754-*
- split: jordan_curve_theorem
path: data/jordan_curve_theorem-*
- split: huffman
path: data/huffman-*
- split: zf
path: data/zf-*
- split: hedges
path: data/hedges-*
- split: zorns_lemma
path: data/zorns_lemma-*
- split: tortoise_hare_algorithm
path: data/tortoise_hare_algorithm-*
- split: mod_red
path: data/mod_red-*
- split: UnifySL
path: data/UnifySL-*
- split: traversable_fincontainer
path: data/traversable_fincontainer-*
- split: buchberger
path: data/buchberger-*
- split: constructive_geometry
path: data/constructive_geometry-*
- split: tarski_geometry
path: data/tarski_geometry-*
- split: int_map
path: data/int_map-*
- split: float
path: data/float-*
- split: InfSeqExt
path: data/InfSeqExt-*
- split: zchinese
path: data/zchinese-*
- split: smc
path: data/smc-*
- split: pts
path: data/pts-*
- split: param_pi
path: data/param_pi-*
- split: axiomatic_abp
path: data/axiomatic_abp-*
- split: lambda
path: data/lambda-*
- split: maths
path: data/maths-*
- split: quicksort_complexity
path: data/quicksort_complexity-*
- split: fssec_model
path: data/fssec_model-*
- split: ipc
path: data/ipc-*
- split: chinese
path: data/chinese-*
- split: cours_de_coq
path: data/cours_de_coq-*
- split: graphs
path: data/graphs-*
- split: dictionaries
path: data/dictionaries-*
- split: dblib
path: data/dblib-*
- split: cecoa
path: data/cecoa-*
- split: corespec
path: data/corespec-*
- split: free_groups
path: data/free_groups-*
- split: ramsey
path: data/ramsey-*
- split: qarith
path: data/qarith-*
- split: math_comp
path: data/math_comp-*
- split: amm11262
path: data/amm11262-*
- split: angles
path: data/angles-*
- split: orb_stab
path: data/orb_stab-*
- split: qarith_stern_brocot
path: data/qarith_stern_brocot-*
- split: Categories
path: data/Categories-*
- split: group_theory
path: data/group_theory-*
- split: demos
path: data/demos-*
- split: distributed_reference_counting
path: data/distributed_reference_counting-*
- split: subst
path: data/subst-*
- split: miniml
path: data/miniml-*
- split: algebra
path: data/algebra-*
- split: fermat4
path: data/fermat4-*
- split: otway_rees
path: data/otway_rees-*
- split: SCEV_coq
path: data/SCEV_coq-*
- split: PolTac
path: data/PolTac-*
- split: fundamental_arithmetics
path: data/fundamental_arithmetics-*
---
提供机构:
piercemaloney
原始信息汇总
数据集概述
数据集特征
- 名称: text
- 数据类型: string
数据集拆分
| 拆分名称 | 字节数 | 示例数 |
|---|---|---|
| finmap | 745110 | 3 |
| GeometricAlgebra | 2180457 | 8 |
| bdds | 11537326 | 15 |
| concat | 1876052 | 90 |
| topology | 1998914 | 32 |
| euler_formula | 2157257 | 3 |
| ruler_compass_geometry | 531800 | 39 |
| fcsl_pcm | 1911756 | 12 |
| twoSquare | 219009 | 2 |
| zfc | 605621 | 11 |
| shuffle | 87431 | 6 |
| metalib | 190046 | 11 |
| hardware | 185840 | 25 |
| three_gap | 315458 | 7 |
| coq_ext_lib | 311782 | 47 |
| cheerios | 140210 | 6 |
| regexp | 116907 | 7 |
| coq_library_undecidability | 3089099 | 93 |
| automata | 1075636 | 25 |
| coquelicot | 6616788 | 23 |
| izf | 146423 | 8 |
| lemma_overloading | 1075228 | 27 |
| lin_alg | 6760336 | 68 |
| railroad_crossing | 202543 | 1 |
| idxassoc | 139339 | 2 |
| hoare_tut | 39069 | 3 |
| lesniewski_mereology | 123636 | 2 |
| verdi | 12479953 | 28 |
| additions | 147048 | 20 |
| checker | 6689 | 2 |
| VST | 38787573 | 292 |
| domain_theory | 67545 | 4 |
| propcalc | 54422 | 5 |
| circuits | 399265 | 19 |
| CompCert | 14680669 | 142 |
| area_method | 1578799 | 38 |
| bbv | 705745 | 13 |
| ails | 521926 | 11 |
| dep_map | 24951 | 2 |
| ChargeCore | 273913 | 20 |
| markov | 102887 | 1 |
| rsa | 180155 | 5 |
| verdi_raft | 21320585 | 107 |
| goedel | 9110441 | 44 |
| bigenough | 5031 | 1 |
| generic_environments | 142529 | 2 |
| disel | 4787193 | 37 |
| ctltctl | 61038 | 3 |
| coq_list_string | 6409 | 3 |
| QuickChick | 718709 | 17 |
| schroeder | 24958 | 4 |
| lazy_pcf | 497224 | 13 |
| weak_up_to | 175365 | 10 |
| groups | 11115 | 1 |
| pocklington | 617483 | 13 |
| mini_compiler | 14459 | 1 |
| StructTact | 293887 | 17 |
| exceptions | 8962 | 1 |
| coqrel | 196290 | 12 |
| higman_s | 198793 | 5 |
| bellantonicook | 3852234 | 16 |
| rem | 14504 | 1 |
| tree_automata | 1972575 | 17 |
| coq_procrastination | 26763 | 1 |
| higman_cf | 49720 | 2 |
| GeoCoq | 12090435 | 328 |
| coqoban | 38285 | 1 |
| search_trees | 59332 | 5 |
| system | 2053 | 1 |
| ieee754 | 35416 | 3 |
| jordan_curve_theorem | 12530912 | 10 |
| huffman | 1634130 | 25 |
| zf | 933118 | 10 |
| hedges | 360884 | 1 |
| zorns_lemma | 608752 | 19 |
| tortoise_hare_algorithm | 10084 | 1 |
| mod_red | 636981 | 5 |
| UnifySL | 1716822 | 128 |
| traversable_fincontainer | 429019 | 1 |
| buchberger | 2422607 | 29 |
| constructive_geometry | 80179 | 7 |
| tarski_geometry | 112419 | 8 |
| int_map | 817835 | 13 |
| float | 2994074 | 31 |
| InfSeqExt | 106656 | 5 |
| zchinese | 62626 | 6 |
| smc | 6045540 | 15 |
| pts | 72482 | 8 |
| param_pi | 2596347 | 11 |
| axiomatic_abp | 1204713 | 7 |
| lambda | 181085 | 10 |
| maths | 37685 | 3 |
| quicksort_complexity | 489164 | 28 |
| fssec_model | 1569135 | 25 |
| ipc | 3108901 | 31 |
| chinese | 208365 | 13 |
| cours_de_coq | 71295 | 11 |
| graphs | 644609 | 2 |
| dictionaries | 67746 | 1 |
| dblib | 195344 | 6 |
| cecoa | 2449356 | 14 |
| corespec | 1222339 | 28 |
| free_groups | 63973 | 1 |
| ramsey | 11734 | 1 |
| qarith | 51068 | 2 |
| math_comp | 44708646 | 76 |
| amm11262 | 365079 | 5 |
| angles | 322579 | 5 |
| orb_stab | 204783 | 1 |
| qarith_stern_brocot | 10873352 | 35 |
| Categories | 4945 | 1 |
| group_theory | 104940 | 10 |
| demos | 62208 | 5 |
| distributed_reference_counting | 8888400 | 74 |
| subst | 362195 | 17 |
| miniml | 114099 | 1 |
| algebra | 3275753 | 65 |
| fermat4 | 172156 | 5 |
| otway_rees | 226052 | 19 |
| SCEV_coq | 89902 | 1 |
| PolTac | 157370 | 13 |
| fundamental_arithmetics | 308733 | 8 |
数据集大小
- 下载大小: 37361889字节
- 数据集总大小: 286711572字节
搜集汇总
数据集介绍

构建方式
在形式化验证领域,Coq证明助手积累了丰富的开源项目资源。该数据集通过系统性地收集和整理Coq社区中的公开项目代码构建而成,涵盖了从基础数学理论到复杂算法验证的广泛主题。构建过程中,项目被依据其所属库或专题进行划分,每个子集对应一个独立的Coq开发项目,确保了数据来源的结构化与可追溯性。原始代码经过统一提取和格式化,保留了完整的证明脚本与定义,形成了以文本形式存储的标准化语料库。
使用方法
研究人员可通过HuggingFace数据集库直接加载该数据集,利用其标准接口访问各个项目子集。每个子集以纯文本格式提供,可直接用于训练或评估与Coq代码相关的机器学习模型,如证明自动补全、策略预测或代码摘要生成。在实际应用中,用户可以根据`configs`中定义的路径按分割名称加载特定领域的证明代码。该数据集适用于数据驱动的形式化方法研究,能够支持从代码语义理解到交互式证明自动化等一系列前沿探索。
背景与挑战
背景概述
在形式化验证领域,Coq交互式定理证明器作为关键工具,其应用广泛覆盖数学定理证明与软件正确性验证。数据集piercemaloney/coqgym_coq_projects_v2由研究人员Pierce Maloney等人构建,旨在为自动化定理证明研究提供丰富的Coq代码资源。该数据集汇集了多个Coq项目,涵盖几何、代数、逻辑及计算机科学等主题,通过结构化呈现证明脚本,支持机器学习模型学习定理证明策略,推动智能辅助证明系统的发展。
当前挑战
该数据集致力于解决自动化定理证明中的核心挑战,即如何让机器理解并生成复杂的数学证明。其构建过程面临多重困难:数据源自分散的Coq项目,需统一处理多样化的代码风格与依赖结构;证明步骤往往隐含深层逻辑,标注与解析要求极高的领域专业知识;此外,数据规模与质量平衡亦需谨慎考量,以确保模型训练的泛化能力与可靠性。
常用场景
经典使用场景
在形式化验证领域,Coq语言作为交互式定理证明器的核心工具,其项目数据集常被用于构建和评估自动化证明辅助系统。该数据集汇集了涵盖几何、代数、数论及分布式系统等多个数学与计算机科学分支的形式化证明代码,为研究人员提供了丰富的结构化语料库。经典使用场景包括训练机器学习模型以预测证明步骤或生成证明策略,从而加速定理证明过程,提升形式化验证的效率与可扩展性。
解决学术问题
该数据集有效应对了形式化方法中证明构造自动化程度不足的学术挑战。通过提供大规模、多样化的Coq证明代码,它支持研究如何利用数据驱动方法减少人工干预,解决证明搜索空间爆炸问题。其意义在于推动了定理证明与人工智能的交叉融合,为构建智能证明助手奠定了数据基础,显著促进了程序正确性验证与数学定理机械化证明的进展。
实际应用
在实际应用中,该数据集被广泛用于开发集成开发环境中的智能证明插件,辅助软件工程师编写高可靠性代码。例如,在操作系统内核、编译器及加密协议等安全关键系统的验证中,基于此数据集训练的模型能够建议证明策略或检测证明漏洞,增强代码的形式化保障。这些应用不仅降低了形式化验证的门槛,也为工业级软件的安全认证提供了技术支撑。
数据集最近研究
最新研究方向
在形式化验证与定理证明领域,Coq语言作为交互式证明助手的核心工具,其数据集piercemaloney/coqgym_coq_projects_v2汇聚了涵盖几何、代数、硬件验证及分布式系统等多元主题的Coq项目代码。当前研究前沿聚焦于利用此类结构化数据训练机器学习模型,以自动化生成证明策略或补全证明步骤,从而提升形式化验证的效率与可扩展性。随着人工智能在程序合成与符号推理中的深度融合,该数据集为探索神经定理证明、代码生成与形式化数学的交叉研究提供了丰富资源,推动了验证技术在安全关键系统与数学基础理论中的创新应用。
以上内容由遇见数据集搜集并总结生成



