Structure — the hypergraph
A hypergraph (in the sense of Barkeshli–Douglas–Freedman) is directed, ordered, coloured, and acyclic. Each property captures one feature of the Calculus of Inductive Constructions — together they form a faithful combinatorial shadow of a CIC proof term.
The four properties
Let be a hypergraph with vertex set and hyperedge set . Each hyperedge is equipped with an ordered list of input vertices , an ordered list of output vertices , and a colour .
(D) Directed. Inputs and outputs are disjoint: . The hyperedge points from inputs to outputs.
(O) Ordered. Both input and output lists are ordered (not just sets). The same vertex appearing at position vs of an edge's inputs is a different incidence.
(C) Coloured. Every edge carries a label . In our
case — the 12 Lean.Expr constructor
kinds (app, lam, forallE, …).
(A) Acyclic. The vertex relation " is an input of an edge whose output is " generates no cycle. Equivalently, there is a rank function strictly increasing along this relation.
Why these four are the right axioms
Each property corresponds to a structural feature of CIC that any faithful encoding must preserve:
| Property | CIC feature it captures |
|---|---|
| Directed | inference rules are from premises to a conclusion |
| Ordered | the -th argument of an application is not the -th |
| Coloured | is not is not |
| Acyclic | subterms strictly decrease in rank during elaboration |
Drop any one and the encoding stops being a function of the proof term. Add anything else and you start encoding heuristic information that doesn't live in the syntax.
Lean.Expr → Hypergraph
The translation is deterministic and total. It lives in
ProofAtlas/Mapping/Expr.lean:
def Lean.Expr.toHypergraph (e : Expr) :
Hypergraph (Fin (countSubterms e)) ExprColor
It walks in DFS pre-order:
- Vertices = subterm occurrences. Index in where .
- Hyperedges = compound subterms (
app,lam,forallE,letE,mdata,proj). - For each compound: inputs = ordered children, output = the compound itself, colour = constructor kind.
Acyclicity is discharged via the rank . Children sit at strictly larger DFS positions than their parent, so the input → output direction always decreases the index, which strictly increases the rank.
Connectedness is discharged by symmetric closure: every non-root vertex has a parent at a strictly smaller DFS index, so iterating the parent relation reaches vertex 0.
Hypergraph, not graph — why
A binary edge captures " depends on " but cannot record that
both and jointly produce . In CIC the app f a node
collapses both f and a into one output; flattening to a graph
would erase that the output was produced by a 2-arity rule.
Hyperedges keep the arity intact.
For visualisation purposes the bipartite expansion (subterm vertices
as circles, hyperedge nodes as colour-coded rectangles) is exactly
the one rendered by #atlas.graph.
Const-sharing across declarations
When multiple declarations are profiled together (e.g.
#atlas.resistance A B), every reference to the same .const Name
maps to a single shared vertex via a HashMap Lean.Name Nat. Two
proofs that both mention Nat end up in the same connected
component — the metric layer can then ask "how far apart are they
through the shared vocabulary?".
Try it
→ #atlas.graph renders any
declaration's hypergraph in the Lean InfoView.
References
[BDF26] §3 — definition of hypergraphs.