ProofAtlas

Structure — the hypergraph

A hypergraph (in the sense of Barkeshli–Douglas–Freedman) is directed, ordered, coloured, and acyclic. Each property captures one feature of the Calculus of Inductive Constructions — together they form a faithful combinatorial shadow of a CIC proof term.

The four properties

Let H=(V,E)H = (V, E) be a hypergraph with vertex set VV and hyperedge set EE. Each hyperedge eEe \in E is equipped with an ordered list of input vertices in(e)=(u1,,uk)\mathrm{in}(e) = (u_1, \dots, u_k), an ordered list of output vertices out(e)=(v1,,vm)\mathrm{out}(e) = (v_1, \dots, v_m), and a colour c(e)Cc(e) \in C.

(D) Directed. Inputs and outputs are disjoint: in(e)out(e)=\mathrm{in}(e) \cap \mathrm{out}(e) = \varnothing. The hyperedge points from inputs to outputs.

(O) Ordered. Both input and output lists are ordered (not just sets). The same vertex appearing at position ii vs jj of an edge's inputs is a different incidence.

(C) Coloured. Every edge carries a label c(e)Cc(e) \in C. In our case C=ExprColorC = \mathtt{ExprColor} — the 12 Lean.Expr constructor kinds (app, lam, forallE, …).

(A) Acyclic. The vertex relation "uu is an input of an edge whose output is vv" generates no cycle. Equivalently, there is a rank function r:VNr : V \to \mathbb{N} strictly increasing along this relation.

Why these four are the right axioms

Each property corresponds to a structural feature of CIC that any faithful encoding must preserve:

PropertyCIC feature it captures
Directedinference rules are from premises to a conclusion
Orderedthe ii-th argument of an application is not the jj-th
Colouredapp\mathtt{app} is not lam\mathtt{lam} is not forallE\mathtt{forallE}
Acyclicsubterms strictly decrease in rank during elaboration

Drop any one and the encoding stops being a function of the proof term. Add anything else and you start encoding heuristic information that doesn't live in the syntax.

Lean.Expr → Hypergraph

The translation is deterministic and total. It lives in ProofAtlas/Mapping/Expr.lean:

def Lean.Expr.toHypergraph (e : Expr) :
    Hypergraph (Fin (countSubterms e)) ExprColor

It walks ee in DFS pre-order:

  • Vertices = subterm occurrences. Index in Finn\mathrm{Fin}\,n where n=countSubterms(e)n = \mathrm{countSubterms}(e).
  • Hyperedges = compound subterms (app, lam, forallE, letE, mdata, proj).
  • For each compound: inputs = ordered children, output = the compound itself, colour = constructor kind.

Acyclicity is discharged via the rank r(v)=countSubterms(e)v.valr(v) = \mathrm{countSubterms}(e) - v.\mathrm{val}. Children sit at strictly larger DFS positions than their parent, so the input → output direction always decreases the index, which strictly increases the rank.

Connectedness is discharged by symmetric closure: every non-root vertex has a parent at a strictly smaller DFS index, so iterating the parent relation reaches vertex 0.

Hypergraph, not graph — why

A binary edge captures "uu depends on vv" but cannot record that both uu and ww jointly produce vv. In CIC the app f a node collapses both f and a into one output; flattening to a graph would erase that the output was produced by a 2-arity rule. Hyperedges keep the arity intact.

For visualisation purposes the bipartite expansion (subterm vertices as circles, hyperedge nodes as colour-coded rectangles) is exactly the one rendered by #atlas.graph.

Const-sharing across declarations

When multiple declarations are profiled together (e.g. #atlas.resistance A B), every reference to the same .const Name maps to a single shared vertex via a HashMap Lean.Name Nat. Two proofs that both mention Nat end up in the same connected component — the metric layer can then ask "how far apart are they through the shared vocabulary?".

Try it

#atlas.graph renders any declaration's hypergraph in the Lean InfoView.

References

[BDF26] §3 — definition of hypergraphs.