# Search Performance In this assignment, you will implement a few search algorithms, including linear search and binary search, and evaluate their performance. The goal is to introduce you with the basic concepts of algorithmic complexity. Note that there are many working code for both search algorithms online. You should not search for that when working on this assignment. | Part | Section | |---------------|-----------------------------------------------| | 1 (in-lab) | [Part 1: Search Algorithms](#part1) | | 1 (in-lab) | [Check-in Instructions](#checkin) | | 2 (lab/home) | [Part 2: Comparing the functions](#part2) | | 2 (lab/home) | [Submission Instructions](#submission) | ## Getting started This assignment can be done in pair programming. Working with a partner is strongly encouraged. If you have not already done pair programming in the previous assignment, then you need to read the [instructions on pair programming](../../pair_programming.html), and decide whether you want to work with a partner on this assignment. Create a new project named `SearchPerformance` in the `CSCI051p-Workspace` you created on your Desktop. *Double check that you are creating the project in the right place, or you will likely have trouble finding your files later.* Then download the [starter code](searchperf.zip), and unzip the files. You hould see a folder named `starter` that contains two files (`search_performance.py` and `search_performance_tester.py`). Copy both of these files into the (recently created) `SearchPerformance` folder and refresh the view in PyCharm. <!-- **Note:** You might need to manually install the matplotlib package to generate the comparison charts using `plot_comparison` in Part 2 of this assignment. You can do so following the same method as you used to [install the Arcade package](../A3/install.html). --> <a name="part1"></a> ## Part 1: Search Algorithms Start by implementing the two search functions: `linear_search` and `binary_search`. Both functions take two arguments. The first argument is `alist`, which is a list of `int`. The second argument is `value`, which is an `int`. Both functions return a non-negative integer `index` such that `alist[index] = value`. If no such integer `index` exists, the functions return `-1`. #### Function 1: linear_search(alist, value) This function goes through the elements of alist from first to last, checking each one to see if it equals `value`. If an equaling `value` is found in the list, it returns the index of `value` in the list. Otherwise, the function returns `-1`. #### Function 2: binary_search(alist, value) The binary search algorithm, and therefore this function, requires `alist` to already be sorted (make sure you understand why!). If `alist` has length `n` then the function first checks the element in position `n/2` (rounding if necessary) to see if it equals `value`, is greater than `value`, or is less than `value`. If it is equal, then it returns the index immediately. If the element is greater than `value`, then the function repeats the search on the first half of the list. If the element is less than `value`, then the function repeats the search on the second half of the list. If there is no way to further divide the list and the element value has not been found, then the function returns `-1`. Note that binary search can be implemented either iteratively or recursively. - The iterative version uses a `while loop` and uses two variables for keeping track of the indices of the start/end points of the portion of the list in which you're currently searching for value. - The recursive version is best implemented using a helper function `binary_search_helper(alist,value, start, end)`, which searches for value but only for indices in the range from `start` to `end-1` (inclusive). If you do this, the `binary_search` function will contain the single line of code: ``` binary_search_helper(alist, value, 0, len(alist)) ``` Either version is fine, but make sure you think through your approach before you start to code! #### Test cases: linear_search_tester() and binary_search_tester() The `search_performance_tester.py` in the starter includes a few simple tests to ensure that your functions work at all. You should extend those test cases to thoroughly test your functions. #### Function 3: list_of_integers(size) To do more thorough testing of Part 1, and to generate data for Part 2, you will need a function that generates a list of random integers with a specified size. The random integers should be in the range from `1` to `2*size`. If you want to do a search for a number that is guaranteed not to be in the list, you can search for `0` or `2*size + 1`. <a name="checkin"></a> #### Checking In Once you have finished implementing your two search functions, demo your functions to an instructor or TA to collect your lab points. Before finding a TA or professor, make sure your functions have: - appropriate docstrings - good algorithm comments - mnemonic variable names - good use of horizontal and vertical white space We will double check your code, ask you a few questions about it, answer any questions you have, and award your points for Part 1. This must be completed before leaving the lab. After that you should start working on Part 2. <a name="part2"></a> ## Part 2: Comparing the functions You'll do two experiments comparing the running times of the two functions. To get the running time of a function, you need to import the `time` package: ``` import time ``` Now, if you want to time a function called func which takes no arguments, you do so as follows: ``` start = time.time() func() end = time.time() elapsed_time_in_seconds = end-start ``` Note that if something runs fast, the running time might be reported using scientific notation (e.g., `2.1457672119140625e-06`, which is equal to `2.1457672119140625 * 10^-6 = .0000021457672119140625`). If something runs very fast, the running time might be reported as `0.0`, since there's limited precision in the timer. #### sorted_comparison(min_size, max_size) Create a series of lists, with starting with `min_size` and doubling the list size until it reaches `max_size`. For each list, time `linear_search` and `binary_search` for a number that is not in the list. You'll always search for an integer that is __not__ in the list, because that's the worst case (why?). As you know, binary search requires the list to already be sorted. So, you'll create lists of different lengths that are already in sorted order. You can do that by using either of the following methods: - create a list of non-random numbers that is already sorted - use the `list_of_integers` method, and then use `list.sort()` to sort them `sorted_comparision` should return a list of three-element tuples. The first element of each tuple should be a list size. The second element of the tuple should be be the running time of a linear search (for something that does not exist) on a sorted list of that size. The third element of the tuple should be the running time of a binary search (for something that does not exist) on a sorted list of that size. For example, when I ran `sorted_comparison(2,8)` it returned the list ``` [(2, 2.1457672119140625e-06, 9.5367431640625e-07), (4, 9.5367431640625e-07, 2.1457672119140625e-06), (8, 1.1920928955078125e-06, 9.5367431640625e-07)] ``` #### unsorted_comparison(min_size, max_size) Create a series of lists, with starting with `min_size` and doubling the list size until it reaches `max_size`. For each list, time a `linear_search` directly. When timing `binary_search`, you need to add the time for sorting the list and the time for searching a number that is not in the list using `binary_search`. You also know, however, that lists aren't always sorted. So, for this second part you'll create lists of different lengths that contain random numbers. You can create that list however you want. Again, you'll always search for an integer that is not in the list. `unsorted_comparison` should return a list of three-element lists. The first element of the inner list should be a list size. The second element of the inner list should be be the running time of a linear search (for something that does not exist) on a sorted list of that size. The third element of the inner list should be the total running time to: - first sort a list of that size, and then - run a binary search (for something that does not exist) on the sorted list. The list sizes should start at `min_size` and double each time until it reaches `max_size`. For example, when I ran `unsorted_comparison(2,8)` it returned the list: ``` [(2, 2.1457672119140625e-06, 2.86102294922e-06), (4, 1.1920928955078125e-06, 1.66893005371e-06), (8, 2.1457672119140625e-06, 1.66893005371e-06)] ``` #### main() In your main function, you should generate two sets of data: one set that compares the running time of linear and binary search on sorted lists with lengths in the range 2 to 1048576, and another set that compares the running time of linear and binary search on unsorted lists with lengths in the range 2 to 1048576. We have provided a helper function `plot_comparison` that generates these charts. Please read the code and use it to generate two charts, one for running time comparison on sorted lists, the other one for running time comparison on unsorted lists. #### Coding Style Make sure that your program is properly commented: * You should have comments at the very beginning of the file stating your name, course, assignment number and the date. * Each function should have an appropriate docstring, describing: - the purpose of the function - the types and meanings of each parameter - the type and meaning of the return value(s) * Include other comments as necessary to make your code clear In addition, make sure that you have used good style. This includes: * Following naming conventions, e.g. all variables and functions should be lowercase. * Using good (mnemonic) variable names. * Proper use of whitespace, including indenting and use of blank lines to separate chunks of code that belong together. For more detailed descriptions, please review the [Python Coding Style Guidelines](python_style.html). ## Part 3: Ethical Reflection We discussed the different metrics to optimize algorithms for. One of the metrics mentioned is energy. While speed often is considered the highest priority metric when optimizing algorithms, should energy be the main metric for optimizations. Consider energy intense applications such as bitcoin mining (block-chain) or neural network training. Here are sources of information: https://www.whitehouse.gov/ostp/news-updates/2022/09/08/fact-sheet-climate-and-energy-implications-of-crypto-assets-in-the-united-states/#:~:text=As%20of%20August%202022%2C%20published,such%20as%20Argentina%20or%20Australia. https://numenta.com/blog/2022/05/24/ai-is-harming-our-planet ## Part 4: Feedback Create a file named `feedback.txt` that answers the usual questions: 1. How long did you spend on this assignment? 2. Any comments or feedback? Things you found interesting? Things you found challenging? Things you found boring? <a name="submission"></a> ## Submission You will turn in four files for this assignment: * `search_performance.py` which contains your code * `search_performance_tester.py` which contains your tests cases * `search_performance.pdf` which contains graphs and summaries of your results. The pdf file should include: - the graph generated by `plot_comparison` when given the results of `sorted_comparison` as the first argument. - the graph generated by `plot_comparison` when given the results of `unsorted_comparison` as the first argument. - a comparison between your experimental results and a big-O analysis of the algorithms you implemented. - a few sentences, grounded in the data you collected, addressing how you would advise someone choosing between using your linear search and your binary search codes. * `ethics.txt` * `feedback.txt`. These should be submitted using [Gradescope](https://www.gradescope.com). Note that we reserve the right to give you no more than half credit if your files are named incorrectly and/or your function headers do not match the specifications (including names, parameter order, etc). Please double and triple check this before submitting! ## Grade Point Allocations | Part | Feature | Value | |-----------|-------------------------------------------|-----| | Lab | Check-in | 3 | | | | | | Execution | `linear_search` | 5 | | Execution | `binary_search` | 5 | | Execution | `sorted_comparison` | 5 | | Execution | `unsorted_comparison` | 5 | | Execution | `main` | 2 | | Execution | writeup | 4 | | | | | | Testing | Thoroughly tests `linear_search` | 2 | | Testing | Thoroughly tests `binary_search` | 2 | | | | | | Style | Docstrings accurate, relevant | 2 | | Style | Comments in code accurate, relevant | 2 | | Style | Good use of loops and conditionals | 2 | | Style | Good use of variable names and whitespaces | 2 | | Style | Misc | 2 | | | | | | Ethics | Completed file submitted | 2 | | Feedback | Completed feedback file submitted | 2 |