Week 13: Higher Order Functions, Sorting, and Runtime Analysis
Readings
Key Questions
- In my own words, what is a higher-order function?
- What is the meaning of the
lambda
keyword in Python? - What are the tradeoffs between
for...in
and a higher-order function likemap
orfilter
? - How can I sort a Python list by some external criterion?
- How are higher order functions similar to and different from list comprehensions?
- What are some key differences between selection sort and insertion sort?
- What are some tradeoffs to be considered between sorting algorithms?
- What are some things that "performance" can mean when considering computer programs?
- How can I compare the performance of two implementations of the same function?
- How can I compare the performance of two algorithms?
- When we compare the performance of algorithms, do we tend to focus on the best case, the worst case, or the average case? Why?
Topics
Higher-Order Functions
Have you ever typed a function into the shell, but forgot the parentheses?
def my_function(x): return x+1 >>> my_function(2) 3 >>> my_function <function my_function at 0x108e962f0> >>> abs <built-in function abs>
Notice that it does NOT give an error
- Instead, it echoes the value, just like any other expression
- In this case, the value is a function!
>>> type(my_function) <type 'function'>
functions in python are values, just like everything else!
>>> y = my_function >>> y <function my_function at 0x108e962f0> >>> y(2) 3 >>> my_abs = abs >>> my_abs(-10) 10
- we can pass them as parameters
- we can return them from functions
- we can even create them on the fly!
- we can pass them as parameters
Motivating example: oo_calculator.py
- Look at the use of a dictionary of operators
- The objects seem a little redundant to the functions inside
- It's annoying to make a new class every time we need a new operator
- We can make this nicer with higher order functions:
def add(a,b): return a+b def sub(a,b): return a-b def hamburger(a,b): return a**b / 7 operators = { "+": add, "-": sub, } # ... operators["🍔"] = hamburger a, op, b = "5 🍔 7".split(" ") print(operators[op](a,b))
- what do the first four functions in higher_order_functions.py do?
- take two arguments and do standard mathematical calculations
- what does
apply_function
do in higher_order_functions.py?- takes three arguments
- the first is a function!
- applies the function passed as the first argument to the second and third argument and returns the result
- takes three arguments
We can call our apply_function function:
>>> apply_function(add, 2, 3) 5 >>> apply_function(subtract, 2, 3) -1
- to pass a function as a parameter you just give the name of the function as the argument
def
- what the keyword def actually does is
- create a new function
- assign that function to a variable with the name of the function
- what the keyword def actually does is
- what does the
apply_function_to_list
function do?- takes a function and a list as parameters
- you can tell that the parameter "function" is a function because we apply it in the line with the append in it
- iterates through each value in the list
- applies the function
- appends the result of the function to a list that is then returned
- You have written this function so many times, just changing the function you call
- Sorry! You can use this from now on.
- High-level: applies the function to each element in the list and returns a new list containing the result from each of those applications
For example:
>>> apply_function_to_list(double, [1, 2, 3, 4]) [2, 4, 6, 8]
- usually spelled "map"
- takes a function and a list as parameters
- filter
- what does the
filter_list
function do?- also takes a function and a list as parameters
- are there any expectations on what the function should do/return?
- it's used in an if statement
- it should return a bool, i.e. True or False
- The filter function is like map in that it applies the function to every element in the list
- BUT, it only keeps those where the function returns True for that element
For example,
>>> map(is_even, [1, 2, 3, 4]) [False, True, False, True] >>> filter_list(is_even, [1, 2, 3, 4]) [2, 4]
- You have written this function so many times, just changing the function you call
- Sorry! You can use this from now on.
- what does the
- Example: Monte carlo sampling
- monte carlo methods are a way of determining the answer to numerical problems via random sampling
- general idea:
- generate random samples
- look at the outcome of those random samples
- use the answer to the outcomes to estimate the answer
- An example: calculating the area
- we want to calculate the area of a shape, specifically, if I draw an arbitrary shape within a 1 by 1 box, can you tell me the area?
- kind of hard!
- what if I put a bunch of points uniformly in the box and could tell how many were inside the shape?
- e.g., if I put 1000 points in the box with a triangle shape, how many would you expect in the triangle?
- about 500
- what would be the area of the triangle?
- 500/1000 = 0.5
- e.g., if I put 1000 points in the box with a triangle shape, how many would you expect in the triangle?
- key idea: use the proportion of points that fall inside the shape to estimate the area
- we want to calculate the area of a shape, specifically, if I draw an arbitrary shape within a 1 by 1 box, can you tell me the area?
- what are the areas of these two shapes:
- triangle
- quarter circle
- look at
in_circle
andin_triangle
functions in montecarlo.py- what do these functions do?
- Challenge: write a function
monte_carlo
that takes two parameters: number of trials and a shape function- generate "trials" random points (x, y points between 0 and 1)
- count how many are "inside" the shape
- return the proportion, i.e., count/trials
random.random()
will be helpful
- look at
monte_carlo
function in montecarlo.py We can use this to estimate the area of different shapes:
>>> monte_carlo(1000, in_triangle) 0.484 >>> monte_carlo(10000, in_triangle) 0.5005 >>> monte_carlo(100000, in_triangle) 0.49756 >>> monte_carlo(100000, in_circle) 0.7854 >>> monte_carlo(100000, in_circle)*4 3.14896 >>> monte_carlo(1000000, in_circle)*4 3.141972 >>> monte_carlo(10000000, in_circle)*4 3.141894
Analysis of Algorithms
- What is an algorithm?
- A method for solving a problem
- Which is implementable on a computer
- Examples
- Sorting a list of numbers
- Finding a route from one place to another
- Solving Sudoku
- Encrypting a message
- Wiring a microchip
- Compressing a video file
- Playing back a compressed video file
- You've been implementing algorithms all semester!
- Algorithms vs Code
- Functions like
next_states
implement algorithms- In this case, finding successors of a Sudoku board
- Depth-first search is an algorithm
def dfs(...)
is some code implementing it
- For the purposes of 62, 101, 140, … we use "algorithm" to mean:
- Programming language independent…
- Specifications of a problem solving method
- Algorithms are solution methods for problems
- Functions like
- Developing algorithms
- Understand a problem really well
- Think about the steps involved in solving it
- Get something that works
- Analyze it
- Improve it
- Evaluating algorithms
- Imagine we have five algorithms for solving a problem
- How do we choose between them?
- Ideas?
- Correct results
- … (within some approximation bounds)
- Speed
- Memory use
- Easy to read/understand/debug
- Correct results
- Some of these characteristics have an important problem…
- They're realy machine dependent!
- Not just faster/slower computers, but more/less RAM, faster/slower interconnects between components, 32/64-bit, etc…
- Different implementations might have different characteristics too! E.g. Python vs C
- We would like a tool to analyze algorithms in a machine-independent way
- And hopefully a programming language-independent way
- Worst-case analysis
- Or "Big O" analysis
- \(O(f(x))\)
Example: member
Let's implement this algorithm now: "To find an element in some list, compare the element against each element of the list. If it compares equal to any element return true, otherwise return false."
def member(element, some_list): for elt in some_list: if elt == element: return True return False
- We want to evaluate this algorithm's performance
- In a way that doesn't depend on:
- Length of the list
- Speed of my computer
- Amount of RAM in my computer
- Types of items being compared
- …
- What's the worst situation for this algorithm, in terms of performance?
- The item isn't in the list!
- So let's assume it's not in the list
- How many trips through the loop do I make if the list has one item?
- Two items? Four items? Eight?
- What's the pattern here?
- Let's plot it on a chart.
- The x-axis could be the number of items in the list
- What's the y-axis?
- "Number of times through the loop"; or even better, "Number of comparisons between items"
Asymptotic Complexity
- As our
member
function is called with longer and longer lists…- the number of comparisons we make grows linearly!
- We write functions like this as \(O(n)\)
- "We can upper-bound the worst case performance with some line"
- Out of curiosity: What's the best case for
member
?
- If I had a graph with an upward curve instead of a straight line, could I possibly find a line that bounds the worst case?
- What kind of function would I need to describe that curve?
Exercise
- Prove we can't beat the simple implementation of
member
for worst-case performance- Or find a counter-example (it might be a more specialized algorithm!)
- What if I only care about numbers 1-100 and only care whether the item is present, not where?
- I can get this down to grabbing that number entry out of this other list I'm maintaining, which I can do in the same amount of time no matter how long my list is! \(O(1)\)!
- What if the data are numbers sorted in ascending order?
- If the first number is bigger than my target I can stop right away! But if my number is bigger than the last one I'm still hosed.
- Can you find a trick that lets me look at fewer than half of the numbers?
- Since the amount of numbers we consider at each step goes down by half each time this is even better than \(O(n)\), it's \(O(\log n)\)!
Exercise: Map and Filter
- Big-O of
map
andfilter
?- How many calls of the given function?
- Whoa, doesn't depend on the input function at all!
- How many calls of the given function?
- Hopefully this motivates the use of map and filter a little bit more
- We can know properties about them and automatically know those properties for any application of them.
- Cool!
Sorting
- Sorting problems:
Input: A list of orderable things (usually numbers for now)
Output: The list in sorted order, i.e.
l[i]<=l[j] for all i < j
- Consider if you were sorting playing cards
- sort cards: all cards in view. How do you do it?
- sort cards: only view one card at a time. How do you do it?
- Many different ways to sort a list!
Selection sort
- starting from the beginning of the list and working to the back, find the smallest element in the remaining list
- in the first position, put the smallest item in the list
- in the second position, put the next smallest item in the list
- …
- to find the smallest item in the remaining list, simply traverse it, keeping track of the smallest value
- look at
selection_sort
in sorting.py- What is the running time of selection sort?
- We'll use the variable n to describe the length of the array/input
- How many times do we go through the for loop in selectionSort?
- n times
- Each time through the for loop in selectionSort, we find the smallest element. How much work is this?
- first time, n-1, second, n-2, third, n-3 …
- O(n)
- what is the overall cost for selectionSort?
- we go through the for loop n times
- each time we go through the for loop we incur a cost of roughly n
- O(n^2)
- How many times do we go through the for loop in selectionSort?
- We'll use the variable n to describe the length of the array/input
- What is the running time of selection sort?
Insertion sort
- starting from the beginning of the list and, working towards the end, keep the list items we've seen so far in sorted order.
- For each new item, traverse the list of things sorted already and insert it in the correct place.
- look at
insertion_sort
in sorting.py- what is the running time?
- How many times do we iterate through the while loop?
- in the best case: no times
- when does this happen?
- when the list is sorted already
- what is the running time? linear, O(n)
- when does this happen?
- in the worst case: j - 1 times
- when does this happen?
- what is the running time?
- ∑_{j=1}^n-1 j = ((n-1)n)/2
- O(n^2)
- average case: (j-1)/2 times
- O(n^2)
- in the best case: no times
- How many times do we iterate through the while loop?
- what is the running time?