Lecture 9 — 2020-02-18

Parsing with Applicative

This lecture is written in literate Haskell; you can download the raw source.

We wrote up a straightforward instance for Maybe and a more interesting instance for Either e:

instance Applicative (Either e) where
  pure x = Right x -- because Left x would be ill typed!
  (Right f) <*> (Right v) = Right $ f v
  err@(Left e) <*> _ = err
  _ <*> err@(Left e) = err

Then we went over the Applicative definitions for lists. There were two possibilities: cartesian product…

instance Applicative [] where
  pure x = [x]

  []     <*>  _ = []
  _      <*> [] = []
  (f:fs) <*> xs = map f xs ++ fs <*> xs

…and zipping:

newtype ZipList a = ZipList { getZipList :: [a] }
  deriving (Eq, Show, Functor)

instance Applicative ZipList where
  pure = ZipList . repeat
  ZipList fs <*> ZipList xs = ZipList (zipWith ($) fs xs)

Readers

We define an instance for readers, too… a sort of prelude to defining our parsers.

instance Applicative ((->) r) where
  pure v r = v
  frab <*> fra = \r -> frab r (fra r)

We were able to use this instance to construct functions quickly and easily, like:

eogth = (&&) <$> even <*> (>100) 
aos = (||) <$> isAlpha <*> isSpace

The first function returns true on numbers that are even and greater than one hundred; the latter returns true for characters that are alphabetical or whitespace.

Obey the laws

Like Functor, the Applicative type class is governed by laws.

Identity: pure id <*> v = v Composition: pure (.) <*> u <*> v <*> w = u <*> (v <*> w) Homomorphism: pure f <*> pure x = pure (f x) Interchange: u <*> pure y = pure ($ y) <*> u

Note that identity is a generalization of id <$> v = v from Functor, sincef <$> x = pure f <*> x`.

We defined “classic style” parsers in terms of a lexer (String -> [Token]) and a parser ([Token] -> AST), but we spent most of class looking at an alternative model: Applicative parsing.

import Data.Char
import Control.Applicative

newtype Parser a = Parser { parse :: String -> Maybe (a,String) }

letter :: Char -> Parser Char
letter c = Parser $ \s ->
  case s of
    c':s' | c == c' -> Just (c,s')
    _ -> Nothing

letters :: String -> Parser String
letters str = Parser $ \s ->
  if take (length str) s == str
  then Just (str,drop (length str) s)
  else Nothing

EG parse (letter 'c') "chocolate"

instance Functor Parser where
  fmap f p = Parser $ \s ->
    case parse p s of
      Nothing -> Nothing
      Just (v,s') -> Just (f v,s')

Notice how, if you squint, you can see that this Functor instance of Parser is a combination of the instances for Maybe and for readers:

instance Functor Maybe where
  fmap f Nothing = Nothing
  fmap f (Just v) = Just (f v)

instance Functor ((->) r) where
  -- (a -> b) -> (r -> a) -> (r -> b)
  fmap f g x = f (g x)
instance Applicative Parser where
  pure a = Parser $ \s -> Just (a,s)
  f <*> a = Parser $ \s ->  -- f :: Parser (a -> b), a :: Parser a
    case parse f s of
      Nothing -> Nothing
      Just (g,s') -> parse (fmap g a) s' -- g :: a -> b, fmap g a :: Parser b

(Our Applicative instance is also a combination of the instances for Maybe and reader. You can take my word for it… or verify for yourself.)

EG parse ((\x -> [x,x,x]) <$> letter 'c') "chocolate"

letterC = letter 'c'
strCH = (\c h -> [c,h]) <$> letter 'c' <*> letter 'h'

EG parse strCH "chocolate"

string :: String -> Parser String
string [] = pure ""
string (c:s) = (:) <$> (letter c) <*> string s

EG parse (string "choco") "chocolate"

EG parse (string "vanilla") "chocolate"

eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),"") else Nothing

strCH' = (\c h _ -> [c,h]) <$> letter 'c' <*> letter 'h' <*> eof

EG ‘parse strCH’ “ch”isJust (“ch”,[])`

EG parse strCH' "chocolate" yields Nothing.

Notice how we ignored a value for eof. We can use (<*) and (*>) to save ourselves some trouble, writing, e.g.,

strCH'' = (\c h -> [c,h]) <$> letter 'c' <*> letter 'h' <* eof

ensure :: (a -> Bool) -> Parser a -> Parser a
ensure pred p = Parser $ \s ->
  case parse p s of
    Nothing -> Nothing
    Just (a,s') -> if pred a then Just (a,s') else Nothing

satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
  where f [] = Nothing
        f (x:xs) = if p x then Just (x,xs) else Nothing

Observe that letter c is equivalent to satisfy (==c).

lookahead :: Parser (Maybe Char)
lookahead = Parser f
  where f [] = Just (Nothing,[])
        f (c:s) = Just (Just c,c:s)

We could manually define integer parsing:

integer :: Parser Int
integer = Parser $ \s ->
  let (digits,rest) = span isDigit s in
  if null digits then Nothing else Just (read digits,rest)

But it’s nicer if we define a notion of choice:

class Applicative f => Alternative f where
  empty :: f a
  (<|>) :: f a -> f a -> f a

  many, some :: Alternative f => f a -> f [a]
  some p = (:) <$> p <*> many p
  many p = some p <|> pure []

instance Alternative Maybe where
  -- empty :: f a
  empty = Nothing

  -- (<|>) :: f a -> f a -> f a
  Just x <|> _ = Just x
  Nothing <|> r = r

  -- empty <|> f == f
  -- f <|> empty == f
instance Alternative Parser where
  empty = Parser $ \s -> Nothing
  p1 <|> p2 = Parser $ \s ->
    case parse p1 s of
      Just (a,s') -> Just (a,s')
      Nothing -> parse p2 s

integer' :: Parser String
integer' = read <$> someDigits
  where someDigits = (:) <$> satisfy isDigit <*> moreDigits
        moreDigits = someDigits <|> pure []

int :: Parser Int
int = read <$> some (satisfy isDigit)

EG parse int "8675309"

EG parse int "5551212zoop"

EG parse int "KL51212"

threeInts = (\n1 n2 n3 -> [n1,n2,n3]) <$> (int <* char ',') <*> (int <* char ',') <*> (ensure (>0) int)

EG parse threeInts "1,2,3"

EG parse threeInts "1,2,0"

Let’s build a parser for arithmetic expressions. We’ll keep it as an invariant that we parse spaces up before each actual phrase, so “2 + 2” and “2+2” and " 2 +2" all yield Plus (Num 2) (Num 2).

spaces :: Parser ()
spaces = many (satisfy isSpace) *> pure ()

char :: Char -> Parser Char
char c = spaces *> satisfy (==c)

plus, times :: Parser Char
plus = char '+'
times = char '*'

num :: Parser Int
num = spaces *> int

data Arith =
   Num Int
 | Plus Arith Arith
 | Times Arith Arith 
 deriving Show

term, factor, atom :: Parser Arith
term   =     Plus <$> factor <* plus <*> term
         <|> factor
factor =     Times <$> atom <* times <*> factor 
         <|> atom
atom   =     Num <$> num 
         <|> (char '(' *> term <* char ')')

Compare this with the CFG:

Term   ::= Factor + Term | Factor
Factor ::= Atom * Factor | Atom
Atom   ::= n | ( Term )

Note that this parse has a (benign, for now) bug: its arithmetic is right associative.