Lecture 9 — 2018-02-13
Parsing with Applicative
This lecture is written in literate Haskell; you can download the raw source.
We wrote up a straightforward instance for Maybe
and a more interesting instance for Either e
:
instance Applicative (Either e) where
pure x = Right x -- because Left x would be ill typed!
(Right f) <*> (Right v) = Right $ f v
err@(Left e) <*> _ = err
_ <*> err@(Left e) = err
Then we went over the Applicative
definitions for lists. There were two possibilities: cartesian product…
instance Applicative [] where
pure x = [x]
[] <*> _ = []
_ <*> [] = []
(f:fs) <*> xs = map f xs ++ fs <*> xs
…and zipping:
newtype ZipList a = ZipList { getZipList :: [a] }
deriving (Eq, Show, Functor)
instance Applicative ZipList where
pure = ZipList . repeat
ZipList fs <*> ZipList xs = ZipList (zipWith ($) fs xs)
Readers
We define an instance for readers, too… a sort of prelude to defining our parsers.
instance Applicative ((->) r) where
pure v r = v
frab <*> fra = \r -> frab r (fra r)
We were able to use this instance to construct functions quickly and easily, like:
eogth = (&&) <$> even <*> (>100)
aos = (||) <$> isAlpha <*> isSpace
The first function returns true on numbers that are even and greater than one hundred; the latter returns true for characters that are alphabetical or whitespace.
Obey the laws
Like Functor
, the Applicative
type class is governed by laws.
Identity: pure id <*> v = v
Composition: pure (.) <*> u <*> v <*> w = u <*> (v <*> w)
Homomorphism: pure f <*> pure x = pure (f x)
Interchange: u <*> pure y = pure ($ y) <*> u
Note that identity
is a generalization of id <$> v = v
from Functor, since
f <$> x = pure f <*> x`.
We defined “classic style” parsers in terms of a lexer (String -> [Token]
) and a parser ([Token] -> AST
), but we spent most of class looking at an alternative model: Applicative parsing.
import Data.Char
import Control.Applicative
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
letter :: Char -> Parser Char
letter c = Parser $ \s ->
case s of
c':s' | c == c' -> Just (c,s')
_ -> Nothing
letters :: String -> Parser String
letters str = Parser $ \s ->
if take (length str) s == str
then Just (str,drop (length str) s)
else Nothing
EG parse (letter 'c') "chocolate"
instance Functor Parser where
fmap f p = Parser $ \s ->
case parse p s of
Nothing -> Nothing
Just (v,s') -> Just (f v,s')
Notice how, if you squint, you can see that this Functor
instance of Parser
is a combination of the instances for Maybe
and for readers:
instance Functor Maybe where
fmap f Nothing = Nothing
fmap f (Just v) = Just (f v)
instance Functor ((->) r) where
-- (a -> b) -> (r -> a) -> (r -> b)
fmap f g x = f (g x)
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s -> -- f :: Parser (a -> b), a :: Parser a
case parse f s of
Nothing -> Nothing
Just (g,s') -> parse (fmap g a) s' -- g :: a -> b, fmap g a :: Parser b
(Our Applicative
instance is also a combination of the instances for Maybe
and reader. You can take my word for it… or verify for yourself.)
EG parse ((\x -> [x,x,x]) <$> letter 'c') "chocolate"
letterC = letter 'c'
strCH = (\c h -> [c,h]) <$> letter 'c' <*> letter 'h'
EG parse strCH "chocolate"
string :: String -> Parser String
string [] = pure ""
string (c:s) = (:) <$> (letter c) <*> string s
EG parse (string "choco") "chocolate"
EG parse (string "vanilla") "chocolate"
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),"") else Nothing
strCH' = (\c h _ -> [c,h]) <$> letter 'c' <*> letter 'h' <*> eof
EG ‘parse strCH’ “ch”is
Just (“ch”,[])`
EG parse strCH' "chocolate"
yields Nothing
.
Notice how we ignored a value for eof
. We can use (<*)
and (*>)
to save ourselves some trouble, writing, e.g.,
strCH'' = (\c h -> [c,h]) <$> letter 'c' <*> letter 'h' <* eof
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure pred p = Parser $ \s ->
case parse p s of
Nothing -> Nothing
Just (a,s') -> if pred a then Just (a,s') else Nothing
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
Observe that letter c
is equivalent to satisfy (==c)
.
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
We could manually define integer parsing:
integer :: Parser Int
integer = Parser $ \s ->
let (digits,rest) = span isDigit s in
if null digits then Nothing else Just (read digits,rest)
But it’s nicer if we define a notion of choice:
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
many, some :: Alternative f => f a -> f [a]
some p = (:) <$> p <*> many p
many p = some p <|> pure []
instance Alternative Maybe where
-- empty :: f a
empty = Nothing
-- (<|>) :: f a -> f a -> f a
Just x <|> _ = Just x
Nothing <|> r = r
-- empty <|> f == f
-- f <|> empty == f
instance Alternative Parser where
empty = Parser $ \s -> Nothing
p1 <|> p2 = Parser $ \s ->
case parse p1 s of
Just (a,s') -> Just (a,s')
Nothing -> parse p2 s
integer' :: Parser String
integer' = read <$> someDigits
where someDigits = (:) <$> satisfy isDigit <*> moreDigits
moreDigits = someDigits <|> pure []
int :: Parser Int
int = read <$> some (satisfy isDigit)
EG parse int "8675309"
EG parse int "5551212zoop"
EG parse int "KL51212"
threeInts = (\n1 n2 n3 -> [n1,n2,n3]) <$> (int <* char ',') <*> (int <* char ',') <*> (ensure (>0) int)
EG parse threeInts "1,2,3"
EG parse threeInts "1,2,0"
Let’s build a parser for arithmetic expressions. We’ll keep it as an invariant that we parse spaces up before each actual phrase, so “2 + 2” and “2+2” and " 2 +2" all yield Plus (Num 2) (Num 2)
.
spaces :: Parser ()
spaces = many (satisfy isSpace) *> pure ()
char :: Char -> Parser Char
char c = spaces *> satisfy (==c)
plus, times :: Parser Char
plus = char '+'
times = char '*'
num :: Parser Int
num = spaces *> int
data Arith =
Num Int
| Plus Arith Arith
| Times Arith Arith
deriving Show
term, factor, atom :: Parser Arith
term = Plus <$> factor <* plus <*> term
<|> factor
factor = Times <$> atom <* times <*> factor
<|> atom
atom = Num <$> num
<|> (char '(' *> term <* char ')')
Compare this with the CFG:
Term ::= Factor + Term | Factor
Factor ::= Atom * Factor | Atom
Atom ::= n | ( Term )
Note that this parse has a (benign, for now) bug: its arithmetic is right associative.