Week 13: Higher Order Functions, Sorting, and Runtime Analysis

Readings

Key Questions

  • In my own words, what is a higher-order function?
  • What is the meaning of the lambda keyword in Python?
  • What are the tradeoffs between for...in and a higher-order function like map or filter?
  • How can I sort a Python list by some external criterion?
  • How are higher order functions similar to and different from list comprehensions?
  • What are some key differences between selection sort and insertion sort?
  • What are some tradeoffs to be considered between sorting algorithms?
  • What are some things that "performance" can mean when considering computer programs?
  • How can I compare the performance of two implementations of the same function?
  • How can I compare the performance of two algorithms?
  • When we compare the performance of algorithms, do we tend to focus on the best case, the worst case, or the average case? Why?

Topics

Higher-Order Functions

  • Have you ever typed a function into the shell, but forgot the parentheses?

    def my_function(x):
       return x+1
    >>> my_function(2)
    3
    >>> my_function
    <function my_function at 0x108e962f0>
    >>> abs
    <built-in function abs>
    
    • Notice that it does NOT give an error

      • Instead, it echoes the value, just like any other expression
      • In this case, the value is a function!
      >>> type(my_function)
      <type 'function'>
      
  • functions in python are values, just like everything else!

    >>> y = my_function
    >>> y
    <function my_function at 0x108e962f0>
    >>> y(2)
    3
    >>> my_abs = abs
    >>> my_abs(-10)
    10
    
    • we can pass them as parameters
      • we can return them from functions
      • we can even create them on the fly!
  • Motivating example: oo_calculator.py

    • Look at the use of a dictionary of operators
    • The objects seem a little redundant to the functions inside
    • It's annoying to make a new class every time we need a new operator
    • We can make this nicer with higher order functions:
    def add(a,b):
       return a+b
    def sub(a,b):
       return a-b
    def hamburger(a,b):
       return a**b / 7
    operators = {
       "+": add,
       "-": sub,
    }
    # ...
    operators["🍔"] = hamburger
    a, op, b = "5 🍔 7".split(" ")
    print(operators[op](a,b))
    
  • what do the first four functions in higher_order_functions.py do?
    • take two arguments and do standard mathematical calculations
    • what does apply_function do in higher_order_functions.py?
      • takes three arguments
        • the first is a function!
      • applies the function passed as the first argument to the second and third argument and returns the result
  • We can call our apply_function function:

    >>> apply_function(add, 2, 3)
    5
    >>> apply_function(subtract, 2, 3)
    -1
    
    • to pass a function as a parameter you just give the name of the function as the argument
  • def
    • what the keyword def actually does is
      1. create a new function
      2. assign that function to a variable with the name of the function
  • what does the apply_function_to_list function do?
    • takes a function and a list as parameters
      • you can tell that the parameter "function" is a function because we apply it in the line with the append in it
    • iterates through each value in the list
      • applies the function
      • appends the result of the function to a list that is then returned
      • You have written this function so many times, just changing the function you call
        • Sorry! You can use this from now on.
    • High-level: applies the function to each element in the list and returns a new list containing the result from each of those applications
    • For example:

      >>> apply_function_to_list(double, [1, 2, 3, 4])
      [2, 4, 6, 8]
      
    • usually spelled "map"
  • filter
    • what does the filter_list function do?
      • also takes a function and a list as parameters
      • are there any expectations on what the function should do/return?
        • it's used in an if statement
        • it should return a bool, i.e. True or False
    • The filter function is like map in that it applies the function to every element in the list
      • BUT, it only keeps those where the function returns True for that element
    • For example,

      >>> map(is_even, [1, 2, 3, 4])
      [False, True, False, True]
      >>> filter_list(is_even, [1, 2, 3, 4])
      [2, 4]
      
    • You have written this function so many times, just changing the function you call
      • Sorry! You can use this from now on.
  • Example: Monte carlo sampling
    • monte carlo methods are a way of determining the answer to numerical problems via random sampling
    • general idea:
      • generate random samples
      • look at the outcome of those random samples
      • use the answer to the outcomes to estimate the answer
    • An example: calculating the area
      • we want to calculate the area of a shape, specifically, if I draw an arbitrary shape within a 1 by 1 box, can you tell me the area?
        • kind of hard!
      • what if I put a bunch of points uniformly in the box and could tell how many were inside the shape?
        • e.g., if I put 1000 points in the box with a triangle shape, how many would you expect in the triangle?
          • about 500
          • what would be the area of the triangle?
            • 500/1000 = 0.5
      • key idea: use the proportion of points that fall inside the shape to estimate the area
    • what are the areas of these two shapes:
      • triangle
      • quarter circle
    • look at in_circle and in_triangle functions in montecarlo.py
      • what do these functions do?
    • Challenge: write a function monte_carlo that takes two parameters: number of trials and a shape function
      • generate "trials" random points (x, y points between 0 and 1)
      • count how many are "inside" the shape
      • return the proportion, i.e., count/trials
      • random.random() will be helpful
    • look at monte_carlo function in montecarlo.py
    • We can use this to estimate the area of different shapes:

      >>> monte_carlo(1000, in_triangle)
      0.484
      >>> monte_carlo(10000, in_triangle)
      0.5005
      >>> monte_carlo(100000, in_triangle)
      0.49756
      >>> monte_carlo(100000, in_circle)
      0.7854
      >>> monte_carlo(100000, in_circle)*4
      3.14896
      >>> monte_carlo(1000000, in_circle)*4
      3.141972
      >>> monte_carlo(10000000, in_circle)*4
      3.141894
      

Analysis of Algorithms

  • What is an algorithm?
    • A method for solving a problem
    • Which is implementable on a computer
  • Examples
    • Sorting a list of numbers
    • Finding a route from one place to another
    • Solving Sudoku
    • Encrypting a message
    • Wiring a microchip
    • Compressing a video file
    • Playing back a compressed video file
    • You've been implementing algorithms all semester!
  • Algorithms vs Code
    • Functions like next_states implement algorithms
      • In this case, finding successors of a Sudoku board
    • Depth-first search is an algorithm
      • def dfs(...) is some code implementing it
    • For the purposes of 62, 101, 140, … we use "algorithm" to mean:
      • Programming language independent…
      • Specifications of a problem solving method
    • Algorithms are solution methods for problems
  • Developing algorithms
    • Understand a problem really well
    • Think about the steps involved in solving it
    • Get something that works
    • Analyze it
    • Improve it
  • Evaluating algorithms
    • Imagine we have five algorithms for solving a problem
    • How do we choose between them?
    • Ideas?
      • Correct results
        • … (within some approximation bounds)
      • Speed
      • Memory use
      • Easy to read/understand/debug
    • Some of these characteristics have an important problem…
      • They're realy machine dependent!
      • Not just faster/slower computers, but more/less RAM, faster/slower interconnects between components, 32/64-bit, etc…
      • Different implementations might have different characteristics too! E.g. Python vs C
    • We would like a tool to analyze algorithms in a machine-independent way
      • And hopefully a programming language-independent way
  • Worst-case analysis
    • Or "Big O" analysis
    • \(O(f(x))\)

Example: member

Let's implement this algorithm now: "To find an element in some list, compare the element against each element of the list. If it compares equal to any element return true, otherwise return false."

def member(element, some_list):
  for elt in some_list:
    if elt == element:
      return True
  return False
  • We want to evaluate this algorithm's performance
  • In a way that doesn't depend on:
    • Length of the list
    • Speed of my computer
    • Amount of RAM in my computer
    • Types of items being compared
  • What's the worst situation for this algorithm, in terms of performance?
    • The item isn't in the list!
    • So let's assume it's not in the list
  • How many trips through the loop do I make if the list has one item?
    • Two items? Four items? Eight?
    • What's the pattern here?
    • Let's plot it on a chart.
      • The x-axis could be the number of items in the list
      • What's the y-axis?
        • "Number of times through the loop"; or even better, "Number of comparisons between items"

Asymptotic Complexity

  • As our member function is called with longer and longer lists…
    • the number of comparisons we make grows linearly!
    • We write functions like this as \(O(n)\)
    • "We can upper-bound the worst case performance with some line"
    • Out of curiosity: What's the best case for member?
  • If I had a graph with an upward curve instead of a straight line, could I possibly find a line that bounds the worst case?
    • What kind of function would I need to describe that curve?

Exercise

  • Prove we can't beat the simple implementation of member for worst-case performance
    • Or find a counter-example (it might be a more specialized algorithm!)
    • What if I only care about numbers 1-100 and only care whether the item is present, not where?
      • I can get this down to grabbing that number entry out of this other list I'm maintaining, which I can do in the same amount of time no matter how long my list is! \(O(1)\)!
    • What if the data are numbers sorted in ascending order?
      • If the first number is bigger than my target I can stop right away! But if my number is bigger than the last one I'm still hosed.
      • Can you find a trick that lets me look at fewer than half of the numbers?
        • Since the amount of numbers we consider at each step goes down by half each time this is even better than \(O(n)\), it's \(O(\log n)\)!

Exercise: Map and Filter

  • Big-O of map and filter?
    • How many calls of the given function?
      • Whoa, doesn't depend on the input function at all!
  • Hopefully this motivates the use of map and filter a little bit more
    • We can know properties about them and automatically know those properties for any application of them.
    • Cool!

Sorting

  • Sorting problems: Input: A list of orderable things (usually numbers for now) Output: The list in sorted order, i.e. l[i]<=l[j] for all i < j
  • Consider if you were sorting playing cards
    • sort cards: all cards in view. How do you do it?
    • sort cards: only view one card at a time. How do you do it?
  • Many different ways to sort a list!

Selection sort

  • starting from the beginning of the list and working to the back, find the smallest element in the remaining list
    • in the first position, put the smallest item in the list
    • in the second position, put the next smallest item in the list
  • to find the smallest item in the remaining list, simply traverse it, keeping track of the smallest value
  • look at selection_sort in sorting.py
    • What is the running time of selection sort?
      • We'll use the variable n to describe the length of the array/input
        • How many times do we go through the for loop in selectionSort?
          • n times
        • Each time through the for loop in selectionSort, we find the smallest element. How much work is this?
          • first time, n-1, second, n-2, third, n-3 …
          • O(n)
        • what is the overall cost for selectionSort?
          • we go through the for loop n times
          • each time we go through the for loop we incur a cost of roughly n
          • O(n^2)

Insertion sort

  • starting from the beginning of the list and, working towards the end, keep the list items we've seen so far in sorted order.
    • For each new item, traverse the list of things sorted already and insert it in the correct place.
  • look at insertion_sort in sorting.py
    • what is the running time?
      • How many times do we iterate through the while loop?
        • in the best case: no times
          • when does this happen?
            • when the list is sorted already
          • what is the running time? linear, O(n)
        • in the worst case: j - 1 times
          • when does this happen?
          • what is the running time?
            • ∑_{j=1}^n-1 j = ((n-1)n)/2
            • O(n^2)
        • average case: (j-1)/2 times
          • O(n^2)

Author: Joseph C. Osborn

Created: 2020-04-21 Tue 10:44

Validate