How to Efficiently Structure Proofs in Coq for Large Scale Projects?

Hello,

I am working on a large-scale formal verification project using Coq and could use some advice. As the codebase grows…, managing proofs has become increasingly challenging. I am particularly struggling with:

Modularity :- How do you structure proofs to keep them reusable across different parts of the project: ??
Performance :- Some proofs are taking excessively long to compile. Are there best practices to optimize proof scripts; ??
Documentation :- What is the most efficient way to document proofs to make them understandable for team members unfamiliar with the details: ??

I am aware of tactics like Proof using and Section but would appreciate more hands on examples or resources that could help. Additionally…, how do you handle dependencies when integrating multiple proofs: ??

I have already searched on the forum for the solution to my query and found this thread https://discourse.rocq-prover.org/t/how-to-represent-mathematical-structures-in-coq-servicenow but couldn’t find any solution.

Looking forward to learning from your experiences and insights. Any advice, tools or resources you could share would be greatly appreciated !!

Thanks in advance !!

With REgards,
Daniel Jose

These are active research fields and I am far from being an expert. My 2 cts:

  • Split separate subjects in separate files, so that parallel compilation can be exploited and also to reuse these files in a fine grained manner. Otherwise you will pollute your environment with unnecessary entries.
  • Use vos/vok and other unsafe IDE modes when you develop your proofs. The compile time is mostly ignored then.
  • Type classes is an appealing solution for modularity, but it makes things extremely slow if you don’t take the time to tweak them very tightly. Strong expertise needed here.
  • canonical structures are to be considered, strong expertise needed too.
  • Modules are very nice to build modular things (!) but they are very invasive. Most people don’t use them because of that. Or for very limited areas (e.g. stdlib’s MSet). I personnaly wouldn’t throw them aside too quickly, depending on what you want to do. There invasiveness is there only disadvantage AFAIK.

I am a software engineer (with a background in analytic philosophy/logic, FWIW), but still relatively new to type theories and dependently-typed languages, so I’ll just drop in few generic ideas.

Modularity :- How do you structure proofs to keep them reusable across different parts of the project: ??

I do not think it is different from structuring any piece of code: which needs a design, it is not in a first instance about what the implementation language provides. For example, have you tried drawing a diagram of your components and the dependencies? If that diagram isn’t “clean”, the code (the code structure) cannot be: conversely, the code can only be as clean as that diagram is.

Performance :- Some proofs are taking excessively long to compile. Are there best practices to optimize proof scripts; ??

In general, we get performance 95% for free by plain modularity. Compare this:

Lemma theorem_slow : True /\ True.
Proof.
  split.
  - Time Compute 50000. auto.
  - Time Compute 50000. auto.
Qed.

with this:

Lemma lemma1_slow : True.
Proof. Time Compute 50000. auto. Qed.

Lemma lemma2_slow : True.
Proof. Time Compute 50000. auto. Qed.

Lemma theorem_fast : True /\ True.
Proof.
  split.
  - Time apply lemma1_slow.  (* 0. secs *)
  - Time apply lemma2_slow.  (* 0. secs *)
Qed.

but of course there is no net gain unless the slow lemmas are used in more than one place…

Documentation :- What is the most efficient way to document proofs to make them understandable for team members unfamiliar with the details: ??

I take it you mean non-technical team members or I’d just send them to a Rocq intro class. :slight_smile: As a minimum, annotated proofs/code (and I mean, aside and beyond the code docs, which remain a technical thing), otherwise encapsulating things in a domain specific language (DSL) I’d say is the “canonical” solution there.