How do I prove the correctness of my Ceaser Cipher?

I have made a caeser cipher, and want to prove it’s correctness. However, I have been struggling with the theorem on the bottom, wondering if it is even possible to solve. Is the solution obvious?

From Coq Require Import Arith.
From Coq Require Import Bool.
From Coq Require Export Strings.String.
From Coq Require Import FunctionalExtensionality.
From Coq Require Import List.
Require Import Lia.
Import ListNotations.
Set Default Goal Selector "!".

Definition total_map (A : Type) := nat -> A.

Definition t_empty {A : Type} (v : A) : total_map A :=
  (fun _ => v).

Definition t_update {A : Type} (m : total_map A)
                    (x : nat) (v : A) :=
  fun x' => if Nat.eqb x x' then v else m x'.

Notation "'_' '!->' v" := (t_empty v)
  (at level 100, right associativity).
Example example_empty := (_ !-> false).

Notation "x '!->' v ';' m" := (t_update m x v)
                              (at level 100, v at next level, right associativity).

Inductive letters :=
  | LA | LB | LC | LD | LE | LF | LG | LH | LI | LJ | LK | LL | LM | LN
  | LO | LP | LQ | LR | LS | LT | LU | LV | LW | LX | LY | LZ.

Definition example_letter_map :=
  ( 0 !-> LA;
    1 !-> LB;
    2 !-> LC;
    3 !-> LD;
    4 !-> LE;
    5 !-> LF;
    6 !-> LG;
    7 !-> LH;
    8 !-> LI;
    9 !-> LJ;
    10 !-> LK;
    11 !-> LL;
    12 !-> LM;
    13 !-> LN;
    14 !-> LO;
    15 !-> LP;
    16 !-> LQ;
    17 !-> LR;
    18 !-> LS;
    19 !-> LT;
    20 !-> LU;
    21 !-> LV;
    22 !-> LW;
    23 !-> LX;
    24 !-> LY;
    25 !-> LZ;
    _ !-> LA  (* Default value *)
  ).

Fixpoint letter_eqb (l1 l2 : letters) : bool :=
  match l1, l2 with
  | LA, LA | LB, LB | LC, LC | LD, LD | LE, LE | LF, LF | LG, LG
  | LH, LH | LI, LI | LJ, LJ | LK, LK | LL, LL | LM, LM | LN, LN
  | LO, LO | LP, LP | LQ, LQ | LR, LR | LS, LS | LT, LT | LU, LU
  | LV, LV | LW, LW | LX, LX | LY, LY | LZ, LZ => true
  | _, _ => false
  end.

Definition cipher (n add_by : nat) : option letters :=
  if n <? 26 then
    let idx := (n + add_by) mod 26 in
    Some (example_letter_map idx)
  else
    None.

Compute(cipher 25 2).

Lemma cipher_returns_some :
  forall n add_by,
    0 < n < 26 ->
    exists l, cipher n add_by = Some l.
Proof.
Admitted.

(A lot of work, so here is just a first approximation.)

I’d say your formalization is off mark in several ways:

  • from the fact that you do not need to talk of “total” and “mappings”;
  • to the fact that you are mixing modular arithmetic with conditions on the range of inputs;
  • to, most importantly, the fact that your cipher (better call it encrypt) should rather take a “rotation” value and a “text” (not another “value”) and return a “text” (not an “option of text”);
  • and, finally, correctness should be that encrypt and decrypt are inverse of each other.

So, I have started rewriting it, but I haven’t got to the bottom of it, indeed it’s still quite messy and I’d bet it can be simplified further. I just hope it gives you something to start thinking about/confronting with:

From Coq Require Import PeanoNat.
From Coq Require Import List.

Import Nat ListNotations.

(** The type of "letters". *)
Variant Letter : Set :=
  | LA | LB | LC | LD | LE | LF | LG
  | LH | LI | LJ | LK | LL | LM | LN
  | LO | LP | LQ | LR | LS | LT | LU
  | LV | LW | LX | LY | LZ.

(** The whole "alphabet" as a list. *)
Definition letters : list Letter :=
  [ LA; LB; LC; LD; LE; LF; LG
  ; LH; LI; LJ; LK; LL; LM; LN
  ; LO; LP; LQ; LR; LS; LT; LU
  ; LV; LW; LX; LY; LZ ].

(** The size of the alphabet. *)
Definition letters_size : nat := 26.

(** A helper tactic. *)
Ltac solve_nodup_inv :=
  match goal with
  | F : In _ [] |- _ =>
      inversion F
  | F : In _ _ |- _ =>
      destruct F as [F|F];
      [ inversion F
      | try solve_nodup_inv
      ]
  end.

(** A helper tactic. *)
Ltac solve_nodup :=
  match goal with
  | |- NoDup [] =>
      constructor
  | |- NoDup _ =>
      constructor;
      [ intros F;
        solve_nodup_inv
      | try solve_nodup
      ]
  end.

(** The alphabet is correct: no dups. *)
Lemma letters_corr_nodup :
  NoDup letters.
Proof. unfold letters. solve_nodup. Qed.

(** The alphabet is correct: size. *)
Lemma letters_corr_size :
  length letters = letters_size.
Proof. reflexivity. Qed.

(** Returns the index of [c] in [letters]. *)
Definition letter_idx (c : Letter) : nat :=
  match c with
  |LA=> 0|LB=> 1|LC=> 2|LD=> 3|LE=> 4|LF=> 5|LG=> 6
  |LH=> 7|LI=> 8|LJ=> 9|LK=>10|LL=>11|LM=>12|LN=>13
  |LO=>14|LP=>15|LQ=>16|LR=>17|LS=>18|LT=>19|LU=>20
  |LV=>21|LW=>22|LX=>23|LY=>24|LZ=>25
  end.

(** [letter_idx] is correct. *)
Lemma letter_idx_corr : forall c,
  nth_error letters (letter_idx c) = Some c.
Proof. destruct c; reflexivity. Qed.

(** Encrypts "plain" letter [c] with key [n]. *)
Definition encrypt_letter (n : nat) c : Letter :=
  let i := letter_idx c + n in
    nth (i mod letters_size) letters c.

(** Decrypts "cipher" letter [c] with key [n]. *)
Definition decrypt_letter (n : nat) c : Letter :=
  let m := n mod letters_size in
  let i := letter_idx c + letters_size - m in
    nth (i mod letters_size) letters c.

(** Encrypt/decrypt letter is correct. *)
Lemma encrypt_decrypt_letter_corr : forall n c,
  decrypt_letter n (encrypt_letter n c) = c.
Proof.
  unfold decrypt_letter, encrypt_letter.
  induction n as [|n' In']; destruct c.
  all: try reflexivity.
  - specialize (In' LA).
    set (i := letter_idx LA).
    change (letter_idx LA) with i in In'.
    cbn in i; subst i.
    rewrite add_0_l in *.
  give_up.
Admitted.

(** The type of "texts". *)
Definition Text : Set := list Letter.

(** Encrypts "plain" text [t] with key [n]. *)
Definition encrypt (n : nat) (t : Text) : Text.
Admitted.

(** Decrypts "cipher" text [t] with key [n]. *)
Definition decrypt (n : nat) (t : Text) : Text.
Admitted.

(** Encrypt/decrypt is correct. *)
Lemma encrypt_decrypt_corr : forall n,
  (forall t, decrypt n (encrypt n t) = t) /\
  (forall t, encrypt n (decrypt n t) = t).
Admitted.

And there I have stopped and realized that I probably need something about rotations by one place… but I am still not even sure I am on the right path, it looks still way too complicated for something that amounts to rotating a fixed list…

Your lemma is provable. I can provide the copy-pasting of the proof if required, but I think that it will be more pleasant for you to achieve it alone.
Here is the list of followed steps:

  • introduce variables and unfold cipher
  • You now have to prove the existence of a letter. For it, introduce idx with the same definition as in the definition of cipher and the witness will be the same as in the definition of cipher. (Until now, we just mimicked the definition of cipher).
  • Your goal should now be (if n <? 26 then Some (example_letter_map idx) else None) = Some (example_letter_map idx). Our hypothesis is of the shape n < 26.
  • We assert that n <? 26 = true and prove it using Nat.ltb_spec0 and reflect_iff.
  • We can conclude.

Furthermore, it is not the question, but I agree with previous comment that your implementation introduces a lot of boilerplate that you do not use, and that the direction you are taking won’t be easy to enrich.
I am clearly not a good person to help you improve in this direction, since I never handled even middle scale formalisations.

(When a formalization is not faithful to what it is supposed to formalize, that is a problem of analysis/logic, not one of design/engineering.)

Thank you so much! I will be attempting to make the formalization faithful to what it is supposed to formalize. This is what the proof looks like.


Lemma cipher_returns_some :
  forall n add_by,
    0 < n < 26 ->
    exists l, cipher n add_by = Some l.
Proof.
  intros n add_by H.
  destruct H as [Hpos Hlt].
  unfold cipher.
  set (idx := (n + add_by ) mod 26). 
  exists (example_letter_map idx).
  assert (n <? 26 = true) as Htest.
  { apply Nat.ltb_lt. exact Hlt. }
  rewrite Htest. reflexivity.
Qed.

I will be spending my time trying to solve this. You seem well acquainted with formal proofs, are there any books you recommend? I only went through the first book of Software Foundations, and am planning to buy CoqArt. Thank you for you help

I will be spending my time trying to solve this.

As said, I wouldn’t solve it, I’d rewrite it: I think my take is more faithful to the intended notions, but I find it still way too complicated.

You seem well acquainted with formal proofs, are there any books you recommend?

In general, I see three levels/components to any formalization (equiv. programming, by Curry-Howard we can say) effort:

Analysis (logic)

Going from natural language to something formal is also an art, and one that is mostly lost. E.g. if “Socrates is human” is an atomic proposition, we cannot do much with it: but let’s rephrase as “to be Socrates is to be human”, then formalize as S -> H, and on that line we can do syllogistics…

My reference book on Logic on/of natural language is P.F. Strawson, Introduction to Logical Theory.

Design (engineering)

There isn’t much in terms of design in your problem, except possibly for abstracting the modular arithmetic away, but I have stopped short of that very point: also, what (formal) objects you end up working with preliminarily depends on the outcome of the analysis phase.

Anyway, you could (at least, if there is “budget” for it) go more general: formalize the general/abstract notion of “symmetric cipher”, what it is and its properties, then define the “Caesar Cipher” as an instance of it.

Code design/architecture too is an art, or a craft, also one that is mostly lost. My reference on this is R.S. Pressman, Software Engineering: A Practitioner’s Approach, the 4th edition if you can (i.e. before the so called “democratization of programming” [*]).

[*] I see they are now co-opting that term for AI-assisted programming: that I’d call 2.0, the one I am referring to is the original one, circa 1995, with the advent of the public Internet and the Web especially: suddenly everybody was a programmer, for $50 a page, and the perfect target for the big vendors…

Implementation (coding)

It’s been some two years and I am still a beginner, not with “proving” but with programming with dependent types: I still haven’t fully cracked it, the underlying mechanisms, anyway I am not particularly clever at coding specifically.

I have been perusing Software Foundations myself, I do think it is a great intro and resource, but then indeed I have been reading and playing with whatever has come my way, including Coq’Art and Chlipala’s CPDT.

Hey, I think I figured out a simplified version of a ceaser cipher. I omitted the letters for simplicity, and decided the best way to go was using an inductive proposition to make it easier to reason about. Do you think I reasoned about it correctly?

From Coq Require Import Arith.
Require Import Lia.
Set Default Goal Selector "!".

Definition encrypt (n move_by : nat) : nat :=
    (n + move_by) mod 26.

Definition decrypt (c move_by : nat) : nat :=
  (c + (26 - move_by)) mod 26.

Inductive encrypted : nat -> nat -> Prop :=
  | encrypt_0 (n : nat) : 
      n < 26 -> encrypted n 0
  | encrypt_else (n : nat) (r : nat) :
      n < 26 -> r < 26 -> encrypted (encrypt n r) 0 -> encrypted n r.

Inductive decrypted : nat -> nat -> Prop :=
  | decrypt_0 (n : nat) : 
      n < 26 -> decrypted n 0
  | decrypt_else (n : nat) (r : nat) :
      n < 26 -> r < 26 -> decrypted (decrypt n r) 0 
      -> decrypted n r.

Example ex_encrypt : encrypted 20 9.
Proof.
  apply encrypt_else.
  - lia. - lia. - simpl. apply encrypt_0. unfold encrypt. simpl. lia. Qed.

Example ex_decrypt : decrypted 20 18.
Proof. apply decrypt_else. - lia. - lia. - simpl. apply decrypt_0. unfold decrypt.
simpl. lia. Qed.

Theorem decrypt_correctness :
  forall n m : nat, n < 26 -> m < 26 -> decrypted n m.
Proof.
  intros. apply decrypt_else. 
  - apply H.
  - apply H0.
  - apply decrypt_0. apply Nat.mod_upper_bound. lia.
Qed.

Theorem encrypt_correctness :
  forall n m : nat, n < 26 -> m < 26 -> encrypted n m.
Proof.
  intros. apply encrypt_else. 
  - apply H.
  - apply H0.
  - apply encrypt_0. apply Nat.mod_upper_bound. lia.
Qed.

Theorem decrypt_inverts_encrypt :
  forall n r,
    n < 26 -> r < 26 -> decrypted (encrypt n r) r.
Proof.
  intros. apply decrypt_else.
  - apply Nat.mod_upper_bound. lia.
  - apply H0.
  - apply decrypt_0. apply Nat.mod_upper_bound. lia.
Qed.

Theorem encrypt_inverts_decrypt :
  forall n r,
    n < 26 -> r < 26 -> encrypted (decrypt n r) r.
Proof.
  intros. apply encrypt_else.
  - apply Nat.mod_upper_bound. lia.
  - apply H0.
  - apply encrypt_0. apply Nat.mod_upper_bound. lia.
Qed.

Do you think I reasoned about it correctly?

Looks like a better approximation (and you could use newlines less sparingly), but it’s still not it, I’d say especially not reason-wise.

The immediate problem remains that you are mixing the modular arithmetic with conditions on the inputs, and then proving that decrypt**ed** is the inverse of encrypt (and vice versa), but that is just not it, decrypt is just not the same as decrypted, i.e. not as you have defined them.

Indeed, one counter-example is enough to prove that your encrypt and decrypt are not inverses:

Lemma counter_ex :
  decrypt (encrypt 0 27) 27 <> 0.
Proof. auto. Qed.

I altered decrypt by replacing move_by with move_by mod 26, so that it doesn’t fail for the counter example, and I am attempting to solve the lemma.

Definition encrypt (n move_by : nat) : nat :=
  (n + move_by) mod 26.

Definition decrypt (c move_by : nat) : nat :=
  (c + (26 - (move_by mod 26))) mod 26.

Lemma encrypt_decrypt_inverse :
  forall n m : nat, encrypt (decrypt n m) m = n mod 26.
Proof.
  intros. unfold encrypt, decrypt. rewrite Nat.Div0.add_mod_idemp_l.

However, this seems to be incorrect too, as this gives me:
(n + (26 - m mod 26) + m) mod 26 = n mod 26

That means I have to assert(n + (26 - m mod 26) = n), which means I have to
assert (26 - m mod 26 = 0), which is incorrect. So it seems that my definitions are incorrect? This is certainly odd, as the definitions for encrypt and decrypt are basic math that should be easy to dissect.

AFAICT, that’s correct, including the statement of the theorem (but don’t forget that is only one direction).

No: that would entail the conclusion [not even that, actually], but, as you have noticed, it’s false.

Rather, I have been able to complete your proof as you have started it: I can give you some hint or just post it if you like, anyway it’s “just” about finding the right theorems in Nat

I’d prefer the hint. My first thought is to turn:
(n + (26 - m mod 26) + m) mod 26 = n mod 26
into
(n + (26 - m mod 26 + m)) mod 26 = n mod 26.
This would probably mean that I would assert: (26 - m mod 26 + m) = 0,
as (n + 0) mod 26 = n mod 26.

This is where I get stuck, perhaps there is is theorem I haven’t found, or it is the fact I’m entailing the conclusion. I’ll be seeing if I can prove (26 - m mod 26 + m) = 0.

Lemma encrypt_decrypt_inverse :
  forall n m : nat, encrypt (decrypt n m) m = n mod 26.
Proof.
  intros. unfold encrypt, decrypt. rewrite Nat.Div0.add_mod_idemp_l.
  rewrite <- Nat.add_assoc. 
  assert (26 - m mod 26 + m = 0).

Edit: (26 - m mod 26 + m) = 0. is just not true, as m mod 26 + m <> 26. Perhaps I missed something simple, as I don’t think my approach of proving n mod 26 = (n + 0) mod 26 = (n + (26 - m mod 26 + m)) mod 26 is not incorrect.

Asserting is not really the most natural thing to do there, even if the formulas you are trying were true, which they aren’t.

Starting with your initial proof, here is my next line:

  rewrite <- Nat.Div0.add_mod_idemp_r.

Overall, I have simply proved it pretty much as I’d do it on paper, by manipulating the two sides of the equation.

That said, and as you might already know, Search is your friend: and there isn’t much available related to modulo in the library, which helps…

After rewriting it using Nat.Div0.add_mod_idemp_r, my thought is to search for theorems that resemble (n + (26 - m mod 26) + m mod 26), so I Search(_ + _ + _). Of the many theorems this search gives me, I think that Nat.add_assoc makes the most sense to implement. I also think that I can use Nat.Div0.add_mod will be useful. However, I’m not sure if it is obvious that I can simplify it directly to n mod 26, as within the modulo theorems, none of them deal with substraction. (26 - m mod 26) is what is getting the way here. I’d imagine that there would be a theorem I could search using Search(_ - _) that would be able to help.

within the modulo theorems, none of them deal with substraction.

Indeed, next I have rewritten with Nat.add_assoc, which is not a fact about modulo either… i.e. I wasn’t saying the whole story. I didn’t mean that there is anything trivial to the proof of your theorem either: plus, FWIW, it’s taken me some time and effort before getting familiar enough even just with the PeanoNat library of all the arithmetic libraries, so I’d say don’t feel discouraged…

That said, after rewriting with Nat.add_assoc, we get:

(n + (26 - m mod 26 + m mod 26)) mod 26 = n mod 26

and now notice that m mod 26 is one and same term in two places, so you can ignore the modulo and just Search (_ - ?n + ?n).

This is my proof so far. Now I have to prove m mod 26 <= 26, which is obviously true.

Lemma encrypt_decrypt_inverse :
  forall n m : nat, encrypt (decrypt n m) m = n mod 26.
Proof.
  intros. unfold encrypt, decrypt. rewrite Nat.Div0.add_mod_idemp_l. 
  rewrite <- Nat.Div0.add_mod_idemp_r. rewrite <- Nat.add_assoc. Search (_ - ?n + ?n).
  rewrite Nat.sub_add. 
  - rewrite Nat.Div0.add_mod. rewrite Nat.Div0.mod_same.
  rewrite Nat.add_0_r. rewrite Nat.Div0.mod_mod. reflexivity.
- ....

However, the closest thing the modulo library has to what I need to prove this is:
Nat.Div0.mod_le: forall a b : nat, a mod b <= a. I find it strange that they didn’t implement
Nat.Div0.mod_le_l: forall a b : nat, a mod b <= b. is this an oversight by the maintainers of Rocq? I’m on the latest version of Rocq.

This is my proof so far

This is just a bit more direct (and I am calling the second argument k, not m, for a couple of reasons, not least that you shouldn’t give away the internals of your cipher…):

From Coq Require Import PeanoNat.

Import Nat Div0.

...

Proof.
  intros n k.
  unfold encrypt, decrypt.
  rewrite add_mod_idemp_l.
  rewrite <- add_mod_idemp_r.
  rewrite <- add_assoc.
  rewrite sub_add.
  - rewrite <- add_mod_idemp_r.
    rewrite mod_same.
    rewrite add_0_r.
    reflexivity.
  - 

I find it strange that they didn’t implement
Nat.Div0.mod_le_l: forall a b : nat, a mod b <= b .

Initially I have looked for that, too. :slight_smile: The library does have what you are looking for, just a slightly stronger statement than that: think about the mathematics…

I see that b can’t be zero, as that would cause it to be undefined. So something like Nat.Div0.mod_le_l: forall a b : nat, b > 0 -> a mod b <= b. My thought was to Search (_ > 0 -> _ mod _ <= _) and Search (_ <> 0 -> _ mod _ <= _), though these didn’t return anything. Are these searches semantically correct and practically incorrect, in the sense that they accurately represent what I’m looking for but don’t actually return it due to the design decisions of the Rocq maintainers? Or is it the case that they don’t semantically represent what I’m looking for?

I see that b can’t be zero, as that would cause it to be undefined.

It’s not immediately a matter of side-conditions, as those rather generate sub-goals: beware also that e.g. forall n, n mod 0 = n, i.e. this is not fully the common informal arithmetic (for one thing, all functions are total in Rocq).

Rather, what is the range of n mod m? We can say something stronger than n mod m <= m… Then search again for modulo.

(I’d suggest you struggle with it a little bit. If eventually you still cannot get to the bottom of it, I might just post the code: not much is left to write anyway.)

I figured it out! Thanks for being patient with me. Here is my proof.

Lemma encrypt_decrypt_inverse :
  forall n m : nat, encrypt (decrypt n m) m = n mod 26.
Proof.
  intros n k.
  unfold encrypt, decrypt.
  rewrite add_mod_idemp_l. rewrite <- add_mod_idemp_r.
  rewrite <- add_assoc. rewrite sub_add.
  - rewrite <- add_mod_idemp_r. rewrite mod_same.
    rewrite add_0_r. reflexivity.
  - apply lt_le_incl. apply mod_bound_pos. 
    apply le_0_l. lia.
Qed.

Lemma decrypt_encrypt_inverse :
  forall n m : nat, decrypt (encrypt n m) m = n mod 26.
Proof.
  intros n k.
  unfold encrypt, decrypt.
  rewrite add_mod_idemp_l. rewrite add_shuffle0. 
  rewrite <- add_mod_idemp_r. rewrite <- add_assoc.
  rewrite sub_add.
  - rewrite <- add_mod_idemp_r. rewrite mod_same.
    rewrite add_0_r. reflexivity.  
  - apply lt_le_incl. apply mod_bound_pos. apply le_0_l. lia.
Qed.

Even though this doesn’t fully represent a ceaser cipher, it still accurately preserves the semantics. From here, I’d probably go for defining an encrypt_list function that iterates through a list and encrypts it. Thanks.