Structure — the hypergraph

A hypergraph (in the sense of Barkeshli–Douglas–Freedman) is directed, ordered, coloured, and acyclic. Each property captures one feature of the Calculus of Inductive Constructions — together they form a faithful combinatorial shadow of a CIC proof term.

The four properties

Let $H = (V, E)$ be a hypergraph with vertex set $V$ and hyperedge set $E$ . Each hyperedge $e \in E$ is equipped with an ordered list of input vertices $\mathrm{in}(e) = (u_1, \dots, u_k)$ , an ordered list of output vertices $\mathrm{out}(e) = (v_1, \dots, v_m)$ , and a colour $c(e) \in C$ .

(D) Directed. Inputs and outputs are disjoint: $\mathrm{in}(e) \cap \mathrm{out}(e) = \varnothing$ . The hyperedge points from inputs to outputs.

(O) Ordered. Both input and output lists are ordered (not just sets). The same vertex appearing at position $i$ vs $j$ of an edge's inputs is a different incidence.

(C) Coloured. Every edge carries a label $c(e) \in C$ . In our case $C = \mathtt{ExprColor}$ — the 12 Lean.Expr constructor kinds (app, lam, forallE, …).

(A) Acyclic. The vertex relation " $u$ is an input of an edge whose output is $v$ " generates no cycle. Equivalently, there is a rank function $r : V \to \mathbb{N}$ strictly increasing along this relation.

Why these four are the right axioms

Each property corresponds to a structural feature of CIC that any faithful encoding must preserve:

Property	CIC feature it captures
Directed	inference rules are from premises to a conclusion
Ordered	the $i$ -th argument of an application is not the $j$ -th
Coloured	$\mathtt{app}$ is not $\mathtt{lam}$ is not $\mathtt{forallE}$
Acyclic	subterms strictly decrease in rank during elaboration

Drop any one and the encoding stops being a function of the proof term. Add anything else and you start encoding heuristic information that doesn't live in the syntax.

`Lean.Expr → Hypergraph`

The translation is deterministic and total. It lives in ProofAtlas/Mapping/Expr.lean:

def Lean.Expr.toHypergraph (e : Expr) :
    Hypergraph (Fin (countSubterms e)) ExprColor

It walks $e$ in DFS pre-order:

Vertices = subterm occurrences. Index in $\mathrm{Fin}\,n$ where $n = \mathrm{countSubterms}(e)$ .
Hyperedges = compound subterms (app, lam, forallE, letE, mdata, proj).
For each compound: inputs = ordered children, output = the compound itself, colour = constructor kind.

Acyclicity is discharged via the rank $r(v) = \mathrm{countSubterms}(e) - v.\mathrm{val}$ . Children sit at strictly larger DFS positions than their parent, so the input → output direction always decreases the index, which strictly increases the rank.

Connectedness is discharged by symmetric closure: every non-root vertex has a parent at a strictly smaller DFS index, so iterating the parent relation reaches vertex 0.

Hypergraph, not graph — why

A binary edge captures " $u$ depends on $v$ " but cannot record that both $u$ and $w$ jointly produce $v$ . In CIC the app f a node collapses both f and a into one output; flattening to a graph would erase that the output was produced by a 2-arity rule. Hyperedges keep the arity intact.

For visualisation purposes the bipartite expansion (subterm vertices as circles, hyperedge nodes as colour-coded rectangles) is exactly the one rendered by #atlas.graph.

When multiple declarations are profiled together (e.g. #atlas.resistance A B), every reference to the same .const Name maps to a single shared vertex via a HashMap Lean.Name Nat. Two proofs that both mention Nat end up in the same connected component — the metric layer can then ask "how far apart are they through the shared vocabulary?".

Try it

→ #atlas.graph renders any declaration's hypergraph in the Lean InfoView.

References

[BDF26] §3 — definition of hypergraphs.