Subsections of Computer Science Theory

Subsections of Algorithms

Introduction to Algorithms

What is an algorithm, and why should you care?

In computer science, an algorithm is a set of setups for a computer to accomplish a task.

Algorithm are reason why there is a science in a computer science.

Examples:

  • YouTube use compression algorithms to store and deliver videos efficiently in less cost
  • Google Maps use routing finding algorithms to find the shortest possible route between point A and point B

Why to use algorithms?

  • To perform the task faster
  • To reduce cost by eliminating the unnecessary steps

Computer scientists have written an algorithm for a checker game, where the computer never lose.

What makes a good algorithm?

  • Correctness
  • Efficiency

Sometimes we need the algorithm to give us efficient but not necessarily the 100% accurate answer. For example, a truck needs to find a route between two locations, algorithm may take a lot of time to calculate the correct and the most efficient route. We will be okay for the program to calculate the good but maybe not the best route in the matter of seconds.

How to measure the efficiency?

Computer Scientists use Asymptotic Analysis to find out the efficiency of an algorithm.

Asymptotic analysis is a method used in mathematical analysis and computer science to describe the limiting behavior of functions, particularly focusing on the performance of algorithms. It helps in understanding how an algorithm’s resource requirements, such as time and space, grow as the input size increases.

Guessing Game

If we have to guess the number between 1 and 15, how and every time we guess, we are told, if our guessed number is lower or higher the actual number.

How to approach?

We will start from either 1 to keep increasing one digit until we reach the correct number, or start from 15 and keep decreasing 1 until the guess is right.

The method we use here is called a linear search.

Linear search, also known as sequential search, is a simple searching algorithm used to find an element within a list. It sequentially checks each element of the list until it finds a match or reaches the end of the list.

— Wikipedia

This is the inefficient way of finding the right number. If computer has selection 15, we will need to 15 guesses to reach the correct digit. If we are lucky and computer has selected 1, we can reach it in a single guess.

Another approach we can use is by taking average before each. First guess will be 8, if the guess is lower, we can eliminate all the numbers before 8, if the guess is higher, we can eliminate all the numbers from 8 to 15 and so on.

This approach is called Halving method. And in computer terms, it’s called Binary Search.

Using this technique maximum number of guesses needed can be found:

$$ \text{Maximum number of guesses} = \log_{2}(n) $$

Where n = Maximum Possible Guess

Binary search is a fast search algorithm used in computer science to find a specific element in a sorted array. It works on the principle of divide and conquer, reducing the search space by half with each step. The algorithm starts by comparing the target value with the middle element of the array. If the target value matches the middle element, the search is complete. If the target value is less or greater than the middle element, the search continues in the lower or upper half of the array, respectively. This process repeats until the target value is found, or the search space is exhausted.

— Wikipedia

Binary Search

Binary search is an algorithm for finding an item inside a sorted list. It finds it, by dividing the portion of the list in half repeatedly, which can possibly contain the item. The process goes on until the list is reduced to the last location.

Example

If we want to find a particular star in a Tycho-2 star catalog which contains information about the brightest 2,539,913 stars, in our galaxy.

Linear search would have to go through million of stars until the desired star is found. But through binary search algorithm, we can greatly reduce these guesses. For binary search to work, we need these start array to be sorted alphabetically.

Using this formula:

$$ \text{Maximum number of guesses} = \log_{2}(n) $$

where n = 2,539,913

$$ \text{Maximum number of guessess} \approx 22 $$

So, using binary search, the number of guesses are reduced to merely 22, to reach the desired name of the star.

When describing a computer algorithm to a fellow human, an incomplete description is often good enough. While describing a recipe, some details are intentionally left out, considering the reader/listener knows that anyway. For example, for a cake recipe, we don’t need to tell how to open a refrigerator to get ingredients out, or how to crack an egg. People might know to fill in the missing pieces, but the computer doesn’t. That’s why while giving instructions, we need to tell everything.

You need to provide answers to the following questions while writing an algorithm for a computer:

  • Inputs of the problem?
  • The outputs?
  • What variables to create?
  • Intermediary steps to reach the output?
  • For repeated instructions, how to make use of loops?

Here is the step-by-step guide of using binary search to play the guessing game:

  1. Let min = 1 and max = n.
  2. Guess the avg of max and min, rounded it, so that it’s an integer.
  3. If your guess is right, stop.
  4. If the guess is too low, set min to be one larger than the guess.
  5. If the guess was too high, set max to be one smaller than the guess.
  6. Repeat the step-2.

Implementing binary search of an array

JavaScript and many other programming languages, already provide a way to find out if a given element is in the array or not. But to understand the logic behind it, we need to implement it ourselves.

var primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97];

Let’s suppose we want to know if 67 is a prime number or not. If 67 is in the array, then it’s a prime.

We might also want to know how many primes are smaller than 67, we can do this by finding its index (position) in the array.

The position of an element in an array is known as its index.

Using binary search, $\text min = 2 , max = 97, guess = 41$

As $[ 41 < 67 ]$ so the elements less 41 would be discarded, and now

The next guess would be:

The binary search algorithm will stop here, as it has reached correct integer.

The binary search took only 2 guesses instead of 19 for linear search, to reach the right answer.

Pseudocode

Here’s the pseudocode for binary search, modified for searching in an array. The inputs are the array, which we call array; the number n of elements in array; and target, the number being searched for. The output is the index in array of target:

  1. Let min = 0 and max = n-1.
  2. Compute guess as the average of max and min, rounded down (so that it is an integer).
  3. If array[guess] equals target, then stop. You found it! Return guess.
  4. If the guess was too low, that is, array[guess] < target, then set min = guess + 1.
  5. Otherwise, the guess was too high. Set max = guess - 1.
  6. Go back to step 2.

Implementing Pseudocode

To turn pseudocode intro a program, we should create a function, as we’re writing a code that accepts an input and returns an output, and we want that code to be reusable for different inputs.

Then let’s go into the body of the function, and decide how to implement that. Step-6 says go back to step-2. That sound like a loop. Both for and while loops can be used here. But due to non-sequential guessing of the indexes, while loop will be more suitable.

  1. Let min = 0 and max = n-1.
  2. If max < min, then stop: target is not present in array. Return -1.
  3. Compute guess as the average of max and min, rounded down (so that it is an integer).
  4. If array[guess] equals target, then stop. You found it! Return guess.
  5. If the guess was too low, that is, array[guess] < target, then set min = guess + 1.
  6. Otherwise, the guess was too high. Set max = guess - 1.
  7. Go back to step-2.

Challenge

Implementing binary search...

(If you don’t know JavaScript, you can skip the code challenges, or you can do the Intro to JS course and come back to them.)

Complete the doSearch function so that it implements a binary search, following the pseudo-code below (this pseudo-code was described in the previous article):

  1. Let min = 0 and max = n-1.
  2. If max < min, then stop: target is not present in array. Return -1.
  3. Compute guess as the average of max and min, rounded down (so that it is an integer).
  4. If array[guess] equals target, then stop. You found it! Return guess.
  5. If the guess was too low, that is, array[guess] < target, then set min = guess + 1.
  6. Otherwise, the guess was too high. Set max = guess - 1.
  7. Go back to step 2.

Once implemented, uncomment the Program.assertEqual() statement at the bottom to verify that the test assertion passes.

TBD

Linear search on an array of n elements might have to make as many as n guesses. We know, binary search need a lot less guesses. We also learned that as the length of an array increases, the efficiency of binary search goes up.

The idea is, when binary search makes an incorrect guess, number of reasonable guess left, are at least cut half. Binary search halves the size of the reasonable portion upon every incorrect guess.

Every time we double the size of an array, we require at most one more guess.

Let’s look at the general case of an array of length n, We can express the number of guesses, in the worst case, as “the number of time we can repeatedly halve, starting at n, until we get the value 1, plus one.” But this is inconvenient to write out.

Luckily, there’s a mathematical function that means the same thing as the base-2 logarithm of n. That’s the most often written as $\log_{2}(n)$.

n $\log_{2}(n)$
1 0
2 1
4 2
8 3
16 4
32 5
64 6
128 7
256 8
512 9
1024 10
1,048,576 20
2,097,152 21

Graph of the same table:

Zooming in on smaller values of n:

The logarithm function grows very slowly. Logarithms are the inverse of exponentials, which grow very rapidly, so that if $\log_{2}(n) = x$, then $\ n = 2^{x}$. For example, $\ log_2 128 = 7$, we know that $\ 2^7 = 128$.

That makes it easy to calculate the runtime of a binary search algorithm on an $n$ that’s exactly a power of $2$. If $n$ is $128$, binary search will require at most $8 (log_2 128 + 1)$ guesses.

What if $n$ isn’t a power of $2$? In that case, we can look at the closest lower power of $2$. For an array whose length is 1000, the closest lower power of $2$ is $512$, which equals $2^9$. We can thus estimate that ‍$log_2 1000$ is a number greater than $9$ and less than $10$, or use a calculator to see that its about $9.97$. Adding one to that yields about $10.97$. In the case of a decimal number, we round down to find the actual number of guesses. Therefore, for a 1000-element array, binary search would require at most 10 guesses.

For the Tycho-2 star catalog with 2,539,913 stars, the closest lower power of 2 is ‍ $2^{21}$ (which is 2,097,152), so we would need at most 22 guesses. Much better than linear search!

Compare $n$ vs $log_{2} {n}$ below:

Asymptotic Notation

So far, we analyzed linear search and binary search by counting the max number of guesses we need to make. But what we really want to know is how long these algorithms take. We are interested in time not just guesses. The running time of both include the time needed to make and check guesses.

The running time an algorithm depends on:

  • The time it takes to run the lines of code by the computer
  • Speed of computer
  • programming language
  • The compiler that translates program into machine code

Let’s think more carefully about the running time. We can use a combination of two ideas.

  • First, we need to determine how long the algorithm takes, in terms of the size of its input. This idea makes intuitive sense, doesn’t it? We’ve already seen that the maximum number of guesses in linear search and binary search increases as the length of the array increases. Or think about a GPS. If it knew about only the interstate highway system, and not about every little road, it should be able to find routes more quickly, right? So we think about the running time of the algorithm as a function of the size of its input.
  • Second, we must focus on how fast a function grows with the input size. We call this the rate of growth of the running time. To keep things simple, we need to distill the most important part and cast aside the less important parts. For example, suppose that an algorithm, running on an input of size ‍$n$, takes $6n^2+100n+300$ machine instructions. The $6n^2$ term becomes larger than the remaining terms, $100n+300$, once $n$ becomes large enough, $20$ in this case. Here’s a chart showing values of $6n^2$ and $100n+300$ for values of $n$ from $0$ to $100$:

We should say that running time of this algorithm grows as $n^2$, dropping the coefficient 6 and the remaining terms $100n+300$. It doesn’t really matter what coefficients we use; as long as the running time is $an^2+bn+c$, for some numbers a > 0, b, and c, there will always be a value of $n$ for which $an^2$ is greater than $bn+c$, and this difference increases as $n$ increases. For example, here’s a chart showing values of $0.6n^2$ and $1000n+3000$ so that we’ve reduced the coefficient of $n^2$ by a factor of 10 and increased the other two constants by a factor of 10:

The value of $n$ at which $0.6n^2$ becomes greater than $1000n+3000$ has increased, but there will always be such a crossover point, no matter what the constants.

By dropping the less significant terms and the constant coefficients, we can focus on the important part of an algorithm’s running time—its rate of growth—without getting mired in details that complicate our understanding. When we drop the constant coefficients and the less significant terms, we use asymptotic notation. We’ll see three forms of it: big-$\Theta$ (theta) notation, big-O notation, and big-‍$\Omega$ (omega) notation.

TBD