How to parse coq statements from a coq .v file the official way?

brando90 · June 6, 2023, 4:26pm

The parser I wrote doesn’t seem to be good enough to go through a coq statement from a coq file. We assumed that a coq statement would always end in a ".\s" with the exception of strings and comments.
However after parsing though some of the simple files it looks like also “- intros.” is really two statements, one for “-” and another for whatever comes after. “*” has a similar problem.

Option 1: use sertop/coq serapi’s parsing abilities

I’m not sure if we should try to modify the parser every time one of these different edge cases shows up or try to use the parsing ability integrated into sertop (it returns an (Added …) for every statement it parses with information about where it is located) instead…but since it’s deprecated I think I will avoid this option

Option 2: since coq is an extensible language, use it’s own parse

Perhaps the best way is to use Coqc to do get the coq stmts from a coq file – especially since coq is an extensible language.

cross: https://stackoverflow.com/questions/76391513/how-to-parse-coq-statements-from-a-coq-v-file-the-official-way
cross2: How to parse coq statements from a coq .v file the official way?
cross3: Reddit - The heart of the internet

Zimmi48 · June 7, 2023, 12:49pm

Hello,

Indeed, the officially recommended way is not to try to build your own Coq parser because you will always encounter limitations and use Coq’s parser instead.

This is what happens when you use sertop/SerAPI. (Internally, it links to Coq and thus runs its parser).

You should not restrict yourself to using SerAPI at the moment. It’s “just” in maintenance-mode (not getting new features) and not deprecated yet. It will keep being maintained for as long as its users cannot switch to a newer protocol that provides the same functionality. SerAPI is too important to get removed suddenly.

jfehrle · June 10, 2023, 2:51am

If SerAPI can serve your needs, that’s probably your best option.

The parser relies on dynamically adding and removing parsing rules to parse notations. Replicating that in another parser would be a huge effort; it’s not a simple feature. If you only want to split proofs into sentences, that shouldn’t be too difficult, but as you discovered, you need to handle curly braces and bullets (-, +, *). See here. The source file fullGrammar gives an almost-complete grammar for Coq. It doesn’t fully describe lookahead processing needed to resolve ambiguities. orderedGrammar is an edited version of the grammar used in the documentation to make it more readable for users.

The parser is based on camlp5, which is unlike most other parsers I’ve seen. It is not an LALR(k) parser.

What are you trying to accomplish?

PS A couple years ago I wrote code to construct full parse trees that exactly reflect the parser’s processing of each production. I never submitted it as a PR because it was complex, brittle and hard to package for others’ use.

ejgallego · June 15, 2023, 6:36pm

In addition to @Zimmi48 's accurate reply, coq-lsp does provide SerAPI parsing capabilities via the getDocument request, so you can use both.

SerAPI is not deprecated (and coq-lsp does use it essentially), but as Théo correctly pointed out we don’t expect to add new features to it.

Topic		Replies	Views
Is there a full documentation of Coq's grammar? Miscellaneous	15	2552	December 18, 2023
How does one generate a static proof trees of a whole Coq Proof? Miscellaneous	2	927	August 6, 2020
Is there a way to extract ASTs from the Coq compiler? Developing plugins	2	517	June 15, 2023
jsCoq and SerAPI releases Announcements serapi	0	1044	April 24, 2019
Parsing Tactics via SerAPI Using Rocq	5	821	July 15, 2022

How to parse coq statements from a coq .v file the official way?

Option 1: use sertop/coq serapi’s parsing abilities

Option 2: since coq is an extensible language, use it’s own parse

Related topics