PHOAS encoding of pattern matching notations

Hello everyone,

Today I was inspired by this comment of @herbelin to try and embed a programming language that supports pattern matching of product types in Coq using the PHOAS method. And of course, this needs to come with some reasonable notations.

So, I’m starting with a simple PHOAS encoding of lambda calculus with products types, that can be pattern matched in Lam:

From Ltac2 Require Import Ltac2.
Section term.

Variable X : Type.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Fixpoint p2p p : Type :=
match p with
| leaf      => X
| ppair l r => p2p l * p2p r
end.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).

Now I want to create some notations for this, such as [λ (a, (b, c)), a b]. This notation should elaborate to Lam (ppair leaf (ppair leaf leaf)) (fun '(a, (b, c)) => a b). This is a bit of a problem though, because the (a , (b, c)) part of the notation needs to generate both ppair leaf (ppair leaf leaf) and the real Coq pattern '(a, (b, c)). This is problematic, because patterns need to be applicative in Coq, and cannot be computed within the notation. I came up with a solution using Ltac that converts a nested products to prod_pattern:

Ltac to_pattern x :=
match x with
| (?x , ?y) => let a := to_pattern x in
               let b := to_pattern y in
               constr:(ppair a b)
| _ => constr:(leaf)
end.

Notation "[ 'λ' pat , k ]" := 
  (Lam ltac:(let r := to_pattern pat in exact r) (fun pat => k))
  (at level 0, pat pattern, only parsing).

Crazily, this actually almost works! The only problem is that when the ltac part gets evaluated, the system tries to resolve the binders in the pattern, and errors out:
Definition test := [λ (a, (b, c)) , App (Var a) (Var b)]. (* The variable a was not found in the current environment. *)
This can be solved by introducing some bogus variables:

Section test.
Variable a b c d e f : X.
Definition test := [λ (a, (b, c)) , App (Var a) (Var b)].
End test.
Print test. (* test = Lam (ppair leaf (ppair leaf leaf)) (fun '(a, (b, _)) => App (Var a) (Var b)) : term *)

I tried if Ltac2 may be better suited for this problem because it appears to be working on preterms, but something as simple as the following already errors out:

Notation "[[ 'λ' pat , k ]]" :=
  (Lam ltac2:(exact leaf) (fun pat => k))
  (at level 0, pat pattern, only parsing).
Check [[λ a, Var a]]. (* Missing notation term for variable k, probably an ill-typed expression *)

Does anybody have a better solution for this problem?

You could use canonical structures to reify the pattern:

Section term.

Variable X : Type.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Record R := { pat : prod_pattern; p2p : Type }.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).

Canonical Structure Rppair l r :=
  {| pat := ppair (pat l) (pat r); p2p := (p2p l * p2p r) |}.
Canonical Structure Rleaf :=
  {| pat := leaf; p2p := X |}.

Notation "[ 'λ' pat , k ]" := 
  (Lam _ (fun pat => k))
  (at level 0, pat pattern).

Definition test := [λ (a, (b, c)) , App (Var a) (App (Var c) (Var b))].

Note that it works for printing too.

There is still a limitation though: all variables have to be bound in the right-hand side otherwise, it fails to infer that variables have type X. Maybe experts in canonical structures have a solution though.

Edit: I renamed interp into p2p.

This is a very intriguing problem…

The problem with the CS-based approach seems that Coq can’t infer the type of the unused variables. Sadly, patterns don’t support casts (like (a : X, b)), but there is a hacky way to force Coq to unify a variable with a type. However, this intrudes some clutter in the notation…

Definition K : X -> term -> term := fun x y => y.

(** Casts [x] to the type [X] *)
Notation "'castvar' x 'in' y" :=
  (ltac:(match type of (K x y) with
         | _ => exact y
         end))
    (at level 0, only parsing, y at level 200).

Check fun '(x,y) => castvar x in y.
(* fun '(_, y) => y *)
(*      : X * term -> term *)

Definition ex :=
  [λ (a, (b, c)), castvar c in App (Var a) (Var b)].
Print ex.
(* ex = [ λ (a, (b, _)), App (Var a) (Var b)] *)
(*      : term *)
1 Like

Ah, finally a reason for me to learn how Canonical Structures work! Normally I tend to use typeclasses. Apart from the problems pointed out, there is another issue. The way R is stated now, means that one cannot prove anything about the f parameter of Lam. This is because there are no (provable) restrictions on its domain. We can fix this using a correctness predicate.

Require Import Program.
Section term.

Variable X : Type.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Fixpoint interp p : Type :=
match p with
| leaf      => X
| ppair l r => interp l * interp r
end.

Record R := { pat : prod_pattern; p2p : Type; correct : interp pat = p2p }.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).

Program Canonical Structure Rppair l r :=
  {| pat := ppair (pat l) (pat r); p2p := (p2p l * p2p r) |}.
Next Obligation.
destruct l, r; f_equal; cbn; assumption.
Defined.
Program Canonical Structure Rleaf :=
  {| pat := leaf; p2p := X |}.

Notation "[ 'λ' pat , k ]" := 
  (Lam _ (fun pat => k))
  (at level 0, pat pattern).

End term.

I suspect that it may be possible to simplify this further. I’ll have to take another look tomorrow. Thanks!

Actually, this enters in the general category of “unification hints”. There exists Coq support for inverting equations of the form proj ?x = value (for ?x of type record), but there could as well be a support for inverting recursive functions, i.e. for inverting equations of the form interp ?x = value (referring to the recursive function interp of your last post).

So I did some reading on this, this and the paper suggested by @herbelin, and I think I have come up with a real Frankenstein solution that does not require mentioning names like @mwuttke97’s solution needs .

There are two problems with the canonical structures solution. (1) It cannot infer un-named variables. (2) Until now, we have done all notations within the term section, but outside the section terms actually need to have the type forall X, term X. So the notation have to generate this too.

For problem (2) it makes sense to use a typeclass Class var := { X : Type } to infer the type X. For (1) after reading the suggestion in the second paper for a failsafe canonical structure, and the option to combine typeclasses and canonical structures, I came up with a hybrid solution where the base case is solved by an instance and the recursive cases are solved by a canonical structure.

There are now two issues left

  1. The generated terms are a bit messy because they contain proofs. Not the end of the world though.
  2. I have not yet managed to convince Coq to print the notations.

Any suggestions to improve this further?

Require Import Program.
Section term.

Context {X : Type}.

Inductive prod_pattern := leaf | ppair (l r : prod_pattern).

Fixpoint interp p : Type :=
match p with
| leaf      => X
| ppair l r => interp l * interp r
end.

Record R := { pat : prod_pattern; p2p : Type; _ : interp pat = p2p }.

Inductive term : Type :=
| Var (x : X)
| App (x y : term)
| Prod (x y : term)
| Lam p (f : p2p p -> term).

Program Canonical Structure Rppair  l r :=
  Build_R (ppair (pat l) (pat r)) (p2p l * p2p r) _.
Next Obligation.
destruct l as [? ? <-], r as [? ? <-]. reflexivity.
Qed.

End term.
Arguments term X : clear implicits.

Class var := { X : Type }.
Existing Class R.
Instance Rleaf `(var) : R := Build_R leaf X eq_refl.

Declare Custom Entry dsl.
Notation "[ t ]" := (fun `(var) => t) (t custom dsl).
Notation "x" := (Var x) (in custom dsl, x ident).
Notation "( x )" := x (in custom dsl).
Notation "x y" := (App x y) (in custom dsl at level 0).
Notation "'λ' pat , k" := 
  (@Lam X _ (fun pat => k))
  (in custom dsl at level 0, pat pattern, k at level 0).
Notation "'λ' pat , k" := 
  (Lam _ (fun pat => k))
  (in custom dsl at level 0, pat pattern, k at level 0, only printing).

Definition test := [λ (a, (b, _)) , a (a b) b ((b) b) a ].
Print test.
Eval compute in test.

By the way, has any work been done on these unification hints in practice, or is this just a theoretical exercise at the moment?