Homework 9
Using monads in anger
This homework is written in literate Haskell; you can download the raw source to fill in yourself. You’re welcome to submit literate Haskell yourself, or to start fresh in a new file, literate or not.
Please submit homeworks via the DCI submission page.
In this homework, you’ll be working with monads in a variety of ways: to speed up a shuffling algorithm; to create a generic framework for MapReduce; and to generate random values for testing.
In addition, you’ll write a function that converts an AST representing do
notation to an AST representing (>>=)
and return
.
I’ve imported the libraries you’ll need. Look at them before you start! You’ll want to use both the Haskell language documentation as well as Hackage and Hoogle.
You are of course allowed to import other libraries. It may even make your solutions easier!
module Hw09 where
import Control.Monad
import Data.Array.IO
import qualified Data.Map as Map
import Data.Map (Map(..),(!))
import System.Random
import Test.QuickCheck
Problem (1): shuffling
In this problem, we’re going to “shuffle” a list, reordering it randomly.
To start, let’s get familiar with System.Random
.
Write a function that takes two numbers, low
and high
, and returns a random number n
such that low <= n <= high
, i.e., inclusively within the range.
rand :: Int -> Int -> IO Int
rand low high = undefined
Now write a function that takes a list and shuffles it. The straightforward algorithm is O(n2):
- Given a non-empty list
xs
, - randomly split the list into an element
y
and the rest of the listys
, - cons
y
onto the shuffling ofys
.
Don’t worry, we’ll speed it up in a minute.
shuffleList :: [a] -> IO [a]
shuffleList xs = undefined
Don’t forget that you can run :set +s
to get timing information in GHCi. My implementation on my computer runs shuffleList [0..1000]
in 1.41 seconds. It runs shuffleList [0..2000]
in 14.95 seconds… ouch!
It turns out that there’s a much faster, O(n) algorithm for shuffling: the Fisher–Yates shuffle. It works on arrays, not linked lists, so we’ll have to use Haskell’s arrays.
Haskell’s arrays are a little funny: arrays are, in general, parameterized over a given monad. We’ll work with IOArray
s. The IOArray
type represents arrays that can be used in the IO
monad. We’ll interact with these arrays using the MArray
interface.
Let’s take a brief look at IOArray
. It has kind * -> * -> *
. The first type it needs is the type of its indices… we can just use Int
for that, but it’s interesting that we can use anything we want. The second type it needs is the type of its contents. Shuffling won’t care about that, so we’ll end up working with IOArray Int a
.
As a warmup, write a function that takes a list and generates a corresponding array. It’s worth noting that the bounds that Haskell uses in, e.g., newListArray
are inclusive, per Data.Ix
.
listToArray :: [a] -> IO (IOArray Int a)
listToArray x = undefined
Okay: let’s do it. Implement the Fisher–Yates shuffling algorithm that takes a given array and shuffles it.
shuffle :: IOArray Int a -> IO ()
shuffle arr = undefined
Finally, write a function that reads in a file and shuffles its lines.
shuffleFile :: FilePath -> IO String
shuffleFile f = undefined
Problem (2): monadic MapReduce
MapReduce is a model for data parallel computation. We’ll look at data parallel programming after Thanksgiving, but for now let’s try to understand MapReduce as is.
Our mappers will take an input of type a
and produce a list of key-value pairs, where keys have type k
and values have type v
.
type Mapper a k v = a -> [(k,v)]
Our reducers take a key and list of values and produces a new (ideally shorter!) list of values.
type Reducer k v = k -> [v] -> [v]
The actual MapReduce implementaiton has three real phases: mapping, shuffling, and reducing. Here’s a simple implementation:
mapReduce :: Ord k => Mapper a k v -> Reducer k v -> [a] -> [(k,[v])]
mapReduce m r = reduce r . shuffleKeys . concatMap (map listifyVal . m)
where listifyVal (k,v) = (k,[v])
shuffleKeys = Map.fromListWith (++)
reduce r = Map.toList . Map.mapWithKey r
The canonical MapReduce example is word count. (Riveting, isn’t it?) Here’s how to implement word count in MapReduce: given a list of documents, a mapper breaks a given document into its consituent words, where appearance of a word maps to 1. After shuffling, identical words will be grouped together; the reducer sums up each token.
wordCount = mapReduce countWords sumCounts
where countWords = map (\w -> (w,1)) . words
sumCounts _ cs = [sum cs]
I’m not sure why Google is so proud of what amounts to eight lines of code and a slow way to count words.
Let’s modify the MapReduce paradigm to allow for monadic computations, where our mappers and reducers are monadic computations, like so:
type MapperM m a k v = a -> m [(k,v)]
type ReducerM m k v = k -> [v] -> m [v]
Note that a MapperM
returns its list of key-value pairs inside of some monad m
; ReducerM
is similar.
Adapt mapReduce
above to define mapReduceM
:
mapReduceM :: (Ord k, Monad m) => MapperM m a k v -> ReducerM m k v -> [a] -> m [(k,[v])]
mapReduceM m r input = undefined
To test, here’s an adaptation of the wordCount
example above.
wordCountM = mapReduceM countWords sumCounts
where countWords = return . map (\w -> (w,1)) . words
sumCounts w cs = do
when (length cs > 1) $ putStrLn $ "Lots of " ++ w ++ "!"
return [sum cs]
Problem (3): Translating ‘do’ notation
We’ve seen both explicit (>>=)
and return
as well as do
notation. In this exercise, you’ll automatically convert from the latter to the former.
It may help to play around with Haskell’s do notation to see what the exact behavior is; you can also take a look at the Haskell 2010 documentation on do
, though note that our model will ignore the pattern matching component.
Suppose we take statments and do
blocks to hav the following structure:
type Id = String
data Stmt a = Assign Id a | Let Id a | Expr a deriving Show
type DoBlock a = [Stmt a]
Note that this structure simplifies the Haskell 2010 definition a tiny bit. What do you think the type variable a
represents? (This is a rhetorical question, but, seriously: think about it.)
We can define monadic expressions as follows. Note, again, that we differ a tiny bit from the Haskell 2010 definitions.
data MExpr a =
MLam Id (MExpr a) | MApp (MExpr a) (MExpr a)
| MBind a Id (MExpr a) | MReturn (MExpr a) | MEmbed a deriving Show
Write a function that converts a do
block into a monadic expression.
convert :: DoBlock a -> MExpr a
Problem (4): QuickCheck
We’ll be using QuickCheck to write some tests.
Write a QuickCheck property to check that reverse
is involutive, i.e., that reversing a reversed list yields the original list.
prop_rev_involutive l = undefined
Write a QuickCheck property to check that checks the Collatz conjecture for a given number greater than 0.
prop_Collatz = undefined
data ArithExp =
Num Int
| Plus ArithExp ArithExp
| Times ArithExp ArithExp
| Neg ArithExp
deriving Show
eval :: ArithExp -> Int
eval (Num i) = i
eval (Plus e1 e2) = eval e1 + eval e2
eval (Times e1 e2) = eval e1 * eval e2
eval (Neg e) = 0 - eval e
Write a generator that generates arbitrary ArithExp
s. Use it to define an Arbitrary
instance for ArithExp
… keep in mind that we don’t want to generate giant data structures, so you may need to keep track of sizes.
instance Arbitrary ArithExp where
arbitrary = undefined
Write a test to ensure that Plus e e
behaves the same as Times 2 e
for all expressions e
.
prop_double = undefined