Day18_levenshtein
Getting Levenshtein right
Definition delete_edit (src : strand) : list edit
:= map (fun _ ⇒ delete) src.
Definition add_edit (tgt : strand) : list edit
:= map (fun b ⇒ add b) tgt.
The delete_add_edit approach is the simplest way to find edits
from one strand to another: erase the first one entirely, then
write in the second.
Definition delete_add_edit (src : strand) (tgt : strand) : list edit
:= delete_edit src ++ add_edit tgt.
:= delete_edit src ++ add_edit tgt.
This one's new! It produces edits from a strand to itself. We'll
need it in the proof of correctness for levenshtein.
The naive_sub_edit approach is a bit better than
delete_add_edit, using substitute instead of deletes and
adds.
Fixpoint naive_sub_edit (src : strand) (tgt : strand) : list edit :=
match (src, tgt) with
| ([], tgt) ⇒ add_edit tgt
| (src, []) ⇒ delete_edit src
| (_::src', b2::tgt') ⇒
substitute b2 :: naive_sub_edit src' tgt'
end.
match (src, tgt) with
| ([], tgt) ⇒ add_edit tgt
| (src, []) ⇒ delete_edit src
| (_::src', b2::tgt') ⇒
substitute b2 :: naive_sub_edit src' tgt'
end.
And sub_edit is still better, using copy when it can.
Fixpoint sub_edit (src : strand) (tgt : strand) : list edit :=
match (src, tgt) with
| ([], tgt) ⇒ add_edit tgt
| (src, []) ⇒ delete_edit src
| (b1::src', b2::tgt') ⇒
let edit :=
if eq_base b1 b2
then copy
else substitute b2
in
edit :: sub_edit src' tgt'
end.
match (src, tgt) with
| ([], tgt) ⇒ add_edit tgt
| (src, []) ⇒ delete_edit src
| (b1::src', b2::tgt') ⇒
let edit :=
if eq_base b1 b2
then copy
else substitute b2
in
edit :: sub_edit src' tgt'
end.
But levenshtein's "choose the best" approach will be
optimal. Here's our definition.
Fixpoint levenshtein (src tgt : strand) : list edit :=
let fix inner tgt :=
match (src,tgt) with
| ([], _) ⇒ add_edit tgt
| (_, []) ⇒ delete_edit src
| (b1::src', b2::tgt') ⇒
let copy_sub_edits := (if eq_base b1 b2 then copy else substitute b2)
:: levenshtein src' tgt' in
let delete_edits := delete :: levenshtein src' tgt in
let add_edits := add b2 :: inner tgt' in
argmin3 total_cost copy_sub_edits delete_edits add_edits
end
in
inner tgt.
let fix inner tgt :=
match (src,tgt) with
| ([], _) ⇒ add_edit tgt
| (_, []) ⇒ delete_edit src
| (b1::src', b2::tgt') ⇒
let copy_sub_edits := (if eq_base b1 b2 then copy else substitute b2)
:: levenshtein src' tgt' in
let delete_edits := delete :: levenshtein src' tgt in
let add_edits := add b2 :: inner tgt' in
argmin3 total_cost copy_sub_edits delete_edits add_edits
end
in
inner tgt.
Proving validity
Definition valid_edit (src : strand) (tgt : strand) (edits : list edit) :=
eq_strand (apply_edits src edits) tgt = true.
eq_strand (apply_edits src edits) tgt = true.
We could have worked propositionally here, saying:
Definition valid_edit' (src : strand) (tgt : strand) (edits : list edit) :=
apply_edits src edits = tgt.
apply_edits src edits = tgt.
The eq_strand_iff lemma lets us know that valid_edit and
valid_edit' are equivalent. The proofs turn out ot be marginally
easier using valid_edit's approach, so let's stick with that.
Exercise: 1 star, standard (delete_edit_valid)
Lemma delete_edit_valid : ∀ (src : strand),
valid_edit src [] (delete_edit src).
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit src [] (delete_edit src).
Proof.
(* FILL IN HERE *) Admitted.
☐
Lemma add_edit_valid : ∀ (tgt : strand),
valid_edit [] tgt (add_edit tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit [] tgt (add_edit tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
Lemma copy_edit_valid : ∀ src,
valid_edit src src (copy_edit src).
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit src src (copy_edit src).
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 3 stars, standard (delete_add_valid)
This one is the hardest so far. You ought to be able to simply combine delete_edit_valid and add_edit_valid, but it doesn't quite work. (Try it and see!) What helper lemma do you need to prove about how delete_edit and ++ interact? Your lemma might need to state a property more general than what you need just for you to be able to prove it!Lemma delete_add_valid : ∀ (src tgt : strand),
valid_edit src tgt (delete_add_edit src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 2 stars, standard (naive_sub_edit_valid)
Lemma naive_sub_edit_valid : ∀ src tgt,
valid_edit src tgt (naive_sub_edit src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit src tgt (naive_sub_edit src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
Lemma sub_edit_valid : ∀ src tgt,
valid_edit src tgt (sub_edit src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit src tgt (sub_edit src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
Proving Levenshtein valid
Exercise: 2 stars, standard, optional (argmin3_valid)
Lemma argmin3_valid : ∀ (src tgt : strand) (cost : list edit → nat)
(edits1 edits2 edits3 : list edit),
valid_edit src tgt edits1 →
valid_edit src tgt edits2 →
valid_edit src tgt edits3 →
valid_edit src tgt (argmin3 cost edits1 edits2 edits3).
Proof.
(* FILL IN HERE *) Admitted.
☐
(edits1 edits2 edits3 : list edit),
valid_edit src tgt edits1 →
valid_edit src tgt edits2 →
valid_edit src tgt edits3 →
valid_edit src tgt (argmin3 cost edits1 edits2 edits3).
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 3 stars, standard (levenshtein_valid)
A word of warning: you're going to get some pretty complicated terms in your goal! Using unfold/fold in combination will help a bit, but nothing will help quite so much as patient attention. A lot of what's in your goal is noise resulting from the let fix inner, and you can often pay attention only to the front of your goal.
Lemma levenshtein_valid : ∀ (src tgt : strand),
valid_edit src tgt (levenshtein src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit src tgt (levenshtein src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
Going the distance
Exercise: 3 stars, standard, optional (costs)
Prove lemmas characterizing the costs of each of our basic edit-generating functions.Lemma add_cost : ∀ tgt,
total_cost (add_edit tgt) = length tgt.
Proof.
(* FILL IN HERE *) Admitted.
Lemma delete_cost : ∀ src,
total_cost (delete_edit src) = length src.
Proof.
(* FILL IN HERE *) Admitted.
Lemma copy_cost_0 : ∀ (src : strand),
total_cost (copy_edit src) = 0.
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 3 stars, standard (levenshtein_distance_correct)
Let's define a levenshtein_distance that only computes the edit distance without computing edits themselves--this cost is the real thing we care about, not edits... but it'd be nice to know that the cost that levenshtein_distance produces the right cost, i.e., that it agrees with levenshtein. Let's prove it!
Fixpoint levenshtein_distance (src tgt : strand) : nat :=
let fix inner tgt :=
match (src,tgt) with
| ([], _) ⇒ length tgt
| (_, []) ⇒ length src
| (b1::src', b2::tgt') ⇒
let copy_sub_edits := (if eq_base b1 b2 then 0 else 1) + levenshtein_distance src' tgt' in
let delete_edits := 1 + levenshtein_distance src' tgt in
let add_edits := 1 + inner tgt' in
min3 copy_sub_edits delete_edits add_edits
end
in
inner tgt.
Lemma levenshtein_distance_correct : ∀ src tgt,
total_cost (levenshtein src tgt) = levenshtein_distance src tgt.
Proof.
(* FILL IN HERE *) Admitted.
☐
let fix inner tgt :=
match (src,tgt) with
| ([], _) ⇒ length tgt
| (_, []) ⇒ length src
| (b1::src', b2::tgt') ⇒
let copy_sub_edits := (if eq_base b1 b2 then 0 else 1) + levenshtein_distance src' tgt' in
let delete_edits := 1 + levenshtein_distance src' tgt in
let add_edits := 1 + inner tgt' in
min3 copy_sub_edits delete_edits add_edits
end
in
inner tgt.
Lemma levenshtein_distance_correct : ∀ src tgt,
total_cost (levenshtein src tgt) = levenshtein_distance src tgt.
Proof.
(* FILL IN HERE *) Admitted.
☐
Optimality for Levenshtein edit distance
Exercise: 2 stars, standard (naive_sub_worse)
As a warm up, let's show that sub_edit is better than naive_sub_edit. The intuition here is that sub_edit and naive_sub_edit are more or less identical, except sub_edit sometimes produces a copy edit while naive_sub_edit always produces substitute. Since cost copy = 0 and cost (substitute b) = 1, using sub_edit should never be worse than naive_sub_edit.
Lemma naive_sub_worse : ∀ src tgt,
total_cost (sub_edit src tgt) ≤ total_cost (naive_sub_edit src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
total_cost (sub_edit src tgt) ≤ total_cost (naive_sub_edit src tgt).
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 2 stars, standard (add_empty_optimal)
For any edits from the empty strand to a given tgt strand, add_edit is optimal. Why? Well, you need to build the whole strand!
Lemma add_empty_optimal : ∀ (tgt : strand) (edits : list edit),
valid_edit [] tgt edits →
total_cost (add_edit tgt) ≤ total_cost edits.
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit [] tgt edits →
total_cost (add_edit tgt) ≤ total_cost edits.
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 2 stars, standard, optional (delete_empty_optimal)
Similarly, delete_edit is optimal when editing a strand to empty.
Lemma delete_empty_optimal : ∀ (src : strand) (edits : list edit),
valid_edit src [] edits →
total_cost (delete_edit src) ≤ total_cost edits.
Proof.
(* FILL IN HERE *) Admitted.
☐
valid_edit src [] edits →
total_cost (delete_edit src) ≤ total_cost edits.
Proof.
(* FILL IN HERE *) Admitted.
☐
Exercise: 3 stars, standard (levenshtein_optimal)
Prove that levenshtein is optimal. The following lemma will be helpful.
Lemma levenshtein_refl_copy : ∀ (src : strand),
levenshtein src src = copy_edit src.
Proof.
induction src as [|b src' IH].
- reflexivity.
- simpl. rewrite eq_base_refl. rewrite IH.
assert (total_cost (copy :: copy_edit src') = 0). { apply copy_cost_0. }
unfold argmin3. rewrite H. simpl.
reflexivity.
Qed.
levenshtein src src = copy_edit src.
Proof.
induction src as [|b src' IH].
- reflexivity.
- simpl. rewrite eq_base_refl. rewrite IH.
assert (total_cost (copy :: copy_edit src') = 0). { apply copy_cost_0. }
unfold argmin3. rewrite H. simpl.
reflexivity.
Qed.
We've given you the structure of this (fairly challenging!)
proof. Try not to be intimidated by the large formulae you might
be given. Go slow and read everything. If you get confused, use a
whiteboard.
Lemma levenshtein_optimal : ∀ (src tgt : strand) (edits : list edit),
valid_edit src tgt edits →
total_cost (levenshtein src tgt) ≤ total_cost edits.
Proof.
induction src as [|b1 src' IH1].
- intros [|b2 tgt'] edits; unfold valid_edit; intros Hvalid.
+ (* FILL IN HERE *) admit.
+ (* FILL IN HERE *) admit.
- induction tgt as [|b2 tgt' IH2]; unfold valid_edit; intros [|edit edits'] H.
+ (* FILL IN HERE *) admit.
+ (* FILL IN HERE *) admit.
+ (* FILL IN HERE *) admit.
+ simpl.
destruct edit as [| | b3 | b3]; simpl in H.
× replace (cost copy + total_cost edits') with (total_cost (copy :: edits')) by reflexivity.
(* replace e1 with e2 by tactic is a handy way of saying:
assert (e1 = e2) as H. { tactic } rewrite H.
Feel free to use the replace tactic if you like it---it's
a handy way of controlling rewriting/simplification. *)
apply argmin3_leb. left.
(* FILL IN HERE *) admit.
× replace (cost delete + total_cost edits') with (total_cost (delete :: edits')) by reflexivity.
apply argmin3_leb. right. left.
(* FILL IN HERE *) admit.
× replace (cost (add b3) + total_cost edits') with (total_cost (add b3 :: edits')) by reflexivity.
apply argmin3_leb. right. right.
(* FILL IN HERE *) admit.
× replace (cost (substitute b3) + total_cost edits') with (total_cost (substitute b3 :: edits')) by reflexivity.
apply argmin3_leb. left.
(* FILL IN HERE *) admit.
(* FILL IN HERE *) Admitted.
☐
(* Fri Oct 16 23:37:11 PDT 2020 *)