### Data-Driven Lemma Synthesis for Interactive Proofs

*Proceedings of the ACM on Programming Languages* (OOPSLA 2022), 6, OOPSLA2, Article 143 (October 2022).

Aishwarya Sivaraman, Alex Sanchez-Stern, Bretton Chen, Sorin Lerner, Todd Millstein

Interactive proofs of theorems often require auxiliary helper lemmas to prove the desired theorem. Existing approaches for automatically synthesizing helper lemmas fall into two broad categories. Some approaches are *goal-directed*, producing lemmas specifically to help a user make progress from a given proof state, but they have *limited expressiveness* in terms of the lemmas that can be produced. Other approaches are *highly expressive*, able to generate arbitrary lemmas from a given grammar, but they are completely *undirected* and hence not amenable to interactive usage.
In this paper, we develop an approach to lemma synthesis that is both goal-directed and expressive. The key novelty is a technique for reducing lemma synthesis to a *data-driven* program synthesis problem, whereby examples for synthesis are generated from the current proof state. We also describe a technique to systematically introduce new variables for lemma synthesis, as well as techniques for filtering and ranking candidate lemmas for presentation to the user. We implement these ideas in a tool called `lfind`, which can be run as a Coq tactic. In an evaluation on four benchmark suites, `lfind` produces useful lemmas in 68% of the cases where a human prover used a lemma to make progress. In these cases `lfind` synthesizes a lemma that either enables a fully automated proof of the original goal or that matches the human-provided lemma.

[PDF | Implementation]