This presentation is an adaptation of material from Antoni Lozano and Conrado Martinez
Which algorithm makes you travel the smaller distance?
computational resources it consumes: execution time, memory space, I/O, randomness, etc
Investigate the properties of the complexity of algorithms
Given an algorithm A with input set I, its complexity or cost (in time, in memory space, in I/Os, randomness, etc.) is a function T : I \to \mathbb N (or \mathbb Q or \mathbb R, depending on what we want to study).
Warning
Characterizing such a function T is usually too complex and impractical due to the huge amount of information it yields. It is more useful to group the input according to their size and study T on them.
Definition 1 (Size) The size of an input x (denoted as |x|) is the number of symbols needed to code it:
Let I_n the set of inputs of size n for the algorithm A and T : I \to \mathbb N a cost function.
the best-case cost of A is the function T_{best}: \mathbb N\to \mathbb N T_{best}(n)=\min\{T(i) \mid i\in I_n\} (The best-case cost is not very useful…)
the worst-case cost of A is the function T_{worst}: \mathbb N\to \mathbb N T_{worst}(n)=\max\{T(i) \mid i\in I_n\}
For every input i\in I_n, \qquad T_{best}(n)\leq T(i)\leq T_{worst}(n).
For instance, the average-case cost under the uniform distribution over the set I_n is \sum_{i\in I_n}\frac{T(i)}{|I_n|}.
For every probability distribution \mathcal D_n, T_{best}(n)\leq \mathbb E_{X\sim \mathcal D_n} [T(X)]\leq T_{worst}(n)\ .
In this course we focus almost exclusively on the worst-case complexity.
Pro
Contra
A fundamental feature of functions \mathbb R^+\to \mathbb R^+ is the growth rate.
Example 1
Linear and quadratic functions have different rates of growth. We can also say that they are of different orders of magnitude.
cost | current tech | tech \times100 | tech \times 1000 |
---|---|---|---|
\log_2 n | N_1 | 2^{100}N_1 | 2^{1000}N_1 |
n | N_2 | 100N_2 | 1000N_2 |
n^2 | N_3 | 10N_3 | 31.6N_3 |
n^3 | N_4 | 4.64N_4 | 10N_4 |
2^n | N_5 | N_5 + 6.64 | N_5 + 9.97 |
We want a notation to express growth rate of the cost of algorithms and for it to be independent of multiplicative constants.
We do not want that the cost of an algorithm to depend on implementations.
In computer science, we use asymptotic notations to classify algorithms according to how their run time (or space) requirements grow as the input size grow:
Warning
The \mathcal O, \Omega, \Theta notations do not refer directly to algorithms, but to the growth rate of functions.
\mathcal O(f) (big-Oh of f) is the set of functions from \mathbb R^+ to \mathbb R^+ that eventually grow no faster than a constant times f. That is, a function g is in \mathcal O(f) if there exists a positive constant c such that g(x) \le cf(x) for all x from some value x_0 onwards.
Definition 2 Given a function f:\mathbb R^{+}\to\mathbb R^{+} the set \mathcal O(f) is \mathcal O(f) = \{g: \mathbb R^{+}\to\mathbb R^{+}\,\vert\,\exists x_{0}\, \exists c>0\,\forall x\ge x_{0}\ \ g(x) \le c f(x)\}\ .
Example 2 3x^3+4x^2-5x+6\in \mathcal O(x^3)
We need to find constants c>0 and x_0 such that for every x\geq x_0, 3x^3+4x^2-5x+6\leq cx^3.
We have that 3x^3+4x^2-5x+6\leq (3+4+5+6)x^3=18x^3 whenever x\geq 1. Hence we can take x_0=1 and c=18 (for example).Exercise 1 Show that a_kx^k+a_{k-1}x^{k-1}+\cdots +a_1x+a_0\in \mathcal O(x^k) whenever the a_is are constants and a_k>0.
Definition 3 Given a function f:\mathbb R^{+}\to\mathbb R^{+} the set \Omega(f) is \Omega(f) = \{g: \mathbb R^+\to \mathbb R^+ \mid g\in \mathcal O(f)\}
In other words, \Omega(f) (big-Omega of f) is the set of functions that eventually grow at least as fast as a positive constant times f.
Definition 4 Given a function f:\mathbb R^{+}\to\mathbb R^{+} the set \Theta(f) is \Theta(f) = \mathcal O(f)\cap \Omega(f)
Important
The \Theta notation is more informative than the \mathcal O notation. For example, we will see in a couple of classes that an algorithm, the Binary Search, has cost \Theta(\log n). Therefore saying that it has cost \mathcal O(2^n) is also technically correct but not very informative.
For every integer i\leq n it always holds that i^k\leq n^k, therefore \sum_{i=1}^n i^k\leq \underbrace{n^k+\cdots+n^k}_n=n\cdot n^k=n^{k+1}\ .
That is \sum_{i=1}^n i^k\in \mathcal O(n^{k+1}).
To show that \sum_{i=1}^n i^k\in \Omega(n^{k+1}) notice that for every i\geq n/2, i^k\geq (\frac{n}{2})^k. Therefore, \sum_{i=1}^n i^k=\sum_{i=1}^{n/2} i^k+\sum_{i=n/2+1}^{n} i^k\geq \sum_{i=n/2+1}^{n} i^k \geq \frac{n}{2}\left(\frac{n}{2}\right)^k=\frac{1}{2^{k+1}}n^{k+1}\ . That is \sum_{i=1}^n i^k\in \Omega(n^{k+1}).\sum_{i=0}^n i^k is actually a polynomial of degree n^{k+1} and its coefficients are related to the Bernoulli numbers. Such numbers are what Ada Lovelace in her program wanted to compute.
Although \mathcal O(f), \Omega(f) and \Theta(f) are set of functions, people usually use the “=” instead of the “\in”, for instance write g = \mathcal O(f) instead of g\in\mathcal O(f), or g=\Theta(f) instead of g\in \Theta(f).
Caution
This use is not symmetric: \mathcal O(f) = g, \Omega(f)=g, and \Theta(f)=g are nonsensical.
Assume f,g,h:\mathbb R^{+}\to\mathbb R^{+}.
For every positive real number c,
That is, we talk about \mathcal O(n) and not about \mathcal O(2n) (they are the same).
Since \mathcal O(\log_b(x))=\mathcal O(\log_c(x)), we also omit the base of logarithms and talk about \mathcal O(\log x).
Because \log_c(f)=\frac{\log_b(x)}{\log_b(c)} and \frac{1}{\log_b(c)} is a constant.
The constants are not always irrelevant, for instance in the exponent they are super relevant!
Exercise 2 The sets \mathcal O(2^{n}) and \mathcal O(2^{2n}) are one included in the other. Which is the larger one? Same question for \mathcal O(2^{\log_2(n)}) and \mathcal O(2^{\log_3(n)}). How they relate with \mathcal O(n)?
Use the definition or, if you are lucky and \displaystyle\lim_{n\to+\infty}\frac{f(n)}{g(n)} exists, this will tell you the answer. \small \lim_{n\to+\infty}\frac{f(n)}{g(n)} = \begin{cases} 0 & \Rightarrow f\in\mathcal O(g)\ (\text{and}\ f\notin \Omega(g))\\ c\qquad (0<c<+\infty)& \Rightarrow f\in\Theta(g)\\ +\infty & \Rightarrow f\in \Omega(g)\ (\text{and}\ f\notin \mathcal O(g)) \end{cases}
Warning
Computing \displaystyle\lim_{n\to+\infty}\frac{f(n)}{g(n)} does not always give an answer, for instance \sin(n)+2=\Theta(1), but clearly \displaystyle\lim_{n\to+\infty}\sin(n)+2 does not exist.
Assume f is an increasing function, and a,b two constants
We extend conventional operations on functions such as sums, substractions, division, powers, (etc) to classes of functions: A\otimes B =\{h \mid \exists f\in A\ \exists g\in B\ h=f\otimes g\}\ , where A and B are two sets of functions and \otimes a generic operation (+, -, \cdot, /, …).
When f is a function and A a set of functions we use f \otimes A as an alias for \{f\}\otimes A.
With these conventions we can now make sense of expressions such as n+ \mathcal O(\log n), n^{\mathcal O(1)}, or \Theta(1) + \mathcal O(1/n).
\mathcal O(f)+\mathcal O(g)=\mathcal O(f+g)=\mathcal O(\max\{f,g\}) (same for \Omega and \Theta)
\mathcal O(f)\cdot \mathcal O(g)=\mathcal O(f\cdot g) (same for \Omega and \Theta)
Exercise 3 Prove the above equalities from the definition of \mathcal O.
Which algorithm makes you travel the smaller distance?
Assume the door is at distance n (and for simplicity n=2^k)
T_{worst}(n)= 2\sum_{i=1}^n i + n = 2\frac{n(n+1)}{2} +n=\Theta(n^2)
T_{worst}(n)= 2\sum_{i=0}^{k}2^i+2^k=2(2^{k+1}-1)+2^k=5n-2=\Theta(n)
Algorithm 2 is asymptotically better