adeo1/mathlib_informal_v4.28.0
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/adeo1/mathlib_informal_v4.28.0
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: mathlib_informal_v4.28.0
license: mit
language:
- en
- code
size_categories:
- 100K<n<1M
task_categories:
- text-retrieval
source_datasets:
- original
tags:
- text
- code
- datasets
- lean
- mathlib
- formal-math
- retrieval
configs:
- config_name: default
default: true
data_files:
- split: train
path: mathlib_informal_v4.28.0.jsonl
dataset_info:
features:
- name: doc_id
dtype: string
- name: full_name
dtype: string
- name: short_name
dtype: string
- name: module_name
dtype: string
- name: file_path
dtype: string
- name: kind
dtype: string
- name: signature
dtype: string
- name: docstring
dtype: string
- name: formal_statement
dtype: string
- name: informal_statement
dtype: string
- name: informal_name
dtype: string
- name: search_text_formal
dtype: string
- name: search_text_informal
dtype: string
- name: search_text_joint
dtype: string
- name: dependency_names
sequence: string
- name: imported_module_names
sequence: string
- name: importance_score
dtype: float64
- name: source_corpus
dtype: string
splits:
- name: train
num_examples: 183489
num_bytes: 390888788
---
# mathlib_informal_v4.28.0
## Dataset Summary
This dataset contains Lean `v4.28.0` mathlib declarations informalized with the repo's
external LeanSearch-based pipeline and published in the retrieval schema used by this
codebase.
## What Is Included
- `mathlib_informal_v4.28.0.jsonl`: one JSON object per declaration
- `dataset_metadata.json`: supplemental provenance, schema, and checksum metadata
## Cleaning And Normalization
Machine-local paths were removed from the published rows. The exported dataset keeps the
retrieval fields intact, while normalizing:
- `file_path` to `Mathlib/...`
- `source_corpus` to `Mathlib`
- `doc_id` to `<file_path>:<full_name>`
## Loading
```python
from datasets import load_dataset
ds = load_dataset("adeo1/mathlib_informal_v4.28.0", split="train")
print(ds[0])
```
## Build Provenance
- Lean version: `v4.28.0`
- Mathlib version: `v4.28.0`
- Build source: `leansearch`
- LeanSearch commit: `d461a7669842bd70dc84670e0bdce27c2fe478bd`
- Records: `183489`
提供机构:
adeo1



