Architecture
ProofAtlas turns a Lean 4 declaration into a hypergraph and then measures the geometry of that hypergraph. The library is organised in four layers; each layer has a single responsibility and exposes a clean interface to the layers above and below.
┌──────────────────────────────────────────────────────────────┐
│ Layer 4 — Quality │
│ scripts/check_bdf_metric_instances.sh · IsHypergraphMetric │
│ CI gate, sorry audit, axiom whitelist │
└──────────────────────────────────────────────────────────────┘
▲
┌──────────────────────────┴───────────────────────────────────┐
│ Layer 3 — Compute (Float, #eval-able) │
│ Pipeline/IncidenceData Pipeline/Linalg │
│ Pipeline/Hyperbolicity Pipeline/Cache │
│ Mapping/{Expr, Environment, Declaration} │
│ Command/* (#atlas.*) Export/* (lake exe atlas-export) │
└──────────────────────────┬───────────────────────────────────┘
▲
┌──────────────────────────┴───────────────────────────────────┐
│ Layer 2 — Metric (proof side, ℝ-valued) │
│ Metric/Axioms.lean: IsHypergraphMetric spec │
│ Metric/{Resistance, CommuteTime, JensenShannon, │
│ Diffusion}/{Basic, Instance}.lean │
│ RandomWalk/{KemenySnell, CommuteTime, Spectral/*} │
│ Hyperbolicity/Basic.lean: generic δ, Hausdorff, GH theorems │
└──────────────────────────┬───────────────────────────────────┘
▲
┌──────────────────────────┴───────────────────────────────────┐
│ Layer 1 — Proof (combinatorial) │
│ Hypergraph/Basic.lean: Hypergraph V C │
│ Mapping/Expr.lean: Lean.Expr → Hypergraph │
└──────────────────────────────────────────────────────────────┘
Layer 1 — Proof
Hypergraph/Basic.lean defines Hypergraph V C: a directed,
ordered, coloured, acyclic, connected hypergraph. The four
structural axioms (D, O, C, A) are encoded as fields of the
structure, so they hold by construction. The translation from any
Lean proof term lives in Mapping/Expr.lean:
def Lean.Expr.toHypergraph (e : Expr) :
Hypergraph (Fin (countSubterms e)) ExprColor
The load-bearing lemma is connected: every non-root DFS vertex is
incident to its parent compound subterm. With that sealed, every
Lean declaration in any library ships with a Hypergraph for free.
Layer 2 — Metric
Metric/Axioms.lean defines the contract a candidate distance must
satisfy to count as a "hypergraph metric":
structure IsHypergraphMetric
(d : Hypergraph V C → WalkParams C → V → V → ℝ) where
d_self, d_symm, d_triangle, d_nonneg -- metric axioms
sensitive_to_direction, _to_ordering, _to_coloring -- existence witnesses
Four concrete metrics are registered against this spec:
| metric | file | status |
|---|---|---|
resistanceDist | Metric/Resistance/Instance.lean | certified |
commuteTimeDist | Metric/CommuteTime/Instance.lean | certified |
jsDist | Metric/JensenShannon/Instance.lean | in progress (sorry) |
diffusionDist | Metric/Diffusion/Instance.lean | in progress (sorry) |
All large-scale geometry (Hyperbolicity/Basic.lean for δ-hyperbolicity,
the Hausdorff and Gromov–Hausdorff constructions) is parameterised by
{d} (isMetric : IsHypergraphMetric d). Swap a metric and every
downstream theorem follows.
Layer 3 — Compute
The proof layer is noncomputable (it uses abstract ℝ and carries
proofs). For end-to-end pipelines we need an #eval-able
representation: Pipeline/IncidenceData.lean gives the same
combinatorial object as arrays of Nat vertex indices and Float
weights. Everything in Pipeline/ is total, computable, and free
of proof obligations.
Two extraction modes feed IncidenceData:
- Expr-level (
Mapping/Environment.lean).Exprs.toIncidenceData : Array Expr → IncidenceData × Array Natglues proof terms via const-sharing — every reference to the same.const Namecollapses to one shared vertex. - Decl-level (
Mapping/Declaration.lean).Environment.toDeclIncidence : Environment → Array Name → IncidenceDatabuilds the co-reference graph of a namespace: vertices are declarations, hyperedges are maximal application spines inside each declaration's proof body.
From IncidenceData, Pipeline/Linalg.lean computes the Laplacian
and the effective-resistance matrix, and Pipeline/Hyperbolicity.lean
computes the four-point δ. Pipeline/Cache.lean memoises both per
(env, namespace) so repeated .ns commands don't re-pay the work.
The interactive surface is Command/*: each #atlas.<verb> resolves
identifiers, calls into Pipeline/, and either logs a number or
renders a widget. The same compute machinery is wrapped by the
atlas-export executable in Export/ for shell pipelines and CI.
Layer 4 — Quality
scripts/check_bdf_metric_instances.sh runs in CI on every push.
For each registered metric in the registry, it walks
#print axioms of the corresponding *_isHypergraphMetric
instance theorem and refuses to merge if any sorry is reachable.
The Util/Audit.lean linter (the MathTag linter) forces every
public declaration's docstring to begin with one of **Math.**,
**Eng.**, or **Mixed.** so it's always obvious which lines
correspond to paper content and which are type-theoretic plumbing.
Where to touch
| If you want to | Touch | Recipe |
|---|---|---|
| Add a new mapping | Mapping/MyMapping.lean + Command/MyCmd.lean | add-a-mapping |
| Add a new metric | Metric/MyMetric/{Basic,Instance}.lean + script registry | add-a-metric |
| Add a new command | Command/MyCmd.lean using Command/Common.lean helpers | follow Command/Resistance.lean as a template |
| Add a downstream consumer | Export/MyFormat.lean + lakefile.toml target | follow Export/Profile.lean as a template |
Fill an open sorry | the relevant Metric/*/Basic.lean | (how-to coming in Wave 2) |
Two things this architecture deliberately does
Separates spec from instance. IsHypergraphMetric is the
contract; the four metrics are instances of it. Any new theorem in
Layer 2 or 3 takes an IsHypergraphMetric d hypothesis, not a
specific distance. New metrics drop in without breaking downstream
results.
Separates the proof layer from the compute layer. Layer 1 + 2
prove things about abstract ℝ-valued metrics; Layer 3 evaluates
their Float shadows. A future bridging lemma will pin the
correspondence formally; for now they're kept on speaking terms by
convention. The CI gate on IsHypergraphMetric instances ensures
the proof side stays honest; the #eval-able compute side
ensures the demo loop is fast.