11 Randomized Algorithm

Table of Content

<aside> <img src="/icons/map-pin_gray.svg" alt="/icons/map-pin_gray.svg" width="40px" /> For most of the topics — and there’s a lot of them — we will be covering, at least know the basics of how the algorithm is used and the probability/time complexity. You are not expected to know the proof, derivation, or theory behind it.

</aside>

Randomization

Lastly for this course, we will explore an interesting area of optimization — randomized algorithms. This approach involves the use of randomness to solve complex problems with large data sets that would otherwise be difficult to solve using traditional methods.

A First Application: Contention Resolution

Contention resolution is a problem that arises in distributed systems where multiple processes compete for access to a shared resource, such as a database — i.e. the Lost Update Problem.

There are $n$ processors that share a single database and the database serves at most $1$ processor at a time. The goal is to divide the rounds among the $n$ processors in an equitable fashion.

Figure 11.1 Processes competing.

Figure 11.1 Processes competing.

For example, in Figure 11.1, we have $n$ processes $P_1, P_2, \cdots, P_n$ that are competing for access to a shared database.

Keep in mind there is no communication between the processors of any sort to plan out when to access the database.
If every processor keeps repeatedly trying to access the $D$ at the same time then the database will be locked out, serving $0$ processors.
Not ideal, so the challenge is to design a protocol that can achieve this without relying on any centralized control.

By randomizing the sequence of access attempts from the processors we can smooth out the contention, avoiding lockouts.

Randomized Protocol

To begin, we need to specify some events and the probabilities associated with them.

Consider the event which the process succeeds.

$$ S(i,t) = \textrm{event that process $i$ succeeds in accessing the database at time $t$} $$

The probability of $S(i,t)$ is,

$$ \Pr[S(i,t)] = p(1-p)^{n-1} $$

Since there are $n$ processes in total, each process has an equal chance of requesting access at any given time.

$$ p = \frac{1}{n} $$

Therefore, we can rewrite the probability as,

$$ \Pr[S(i,t)] = \frac{1}{n}\bigg(1-\frac{1}{n}\bigg)^{n-1} $$

Using some calculus, we see there is an asymptotic bound we can use.

$$ \color{0047AB}\frac{1}{en} \leq \Pr[S(i,t)] \leq \frac{1}{2n} $$

Hence, $\Pr[S(i,t)]$ is asymptotically equal to $O(1/n)$.
Consider the event which the process fails.

$$ F(i,t) = \textrm{event that process $i$ fails to access database in rounds $1$ through $t$} $$

We can use the bounds from earlier and set $t = \lceil en \rceil$ to make sure it is an integer, the probability of $F(i,t)$ is,

$$ \Pr[F(i,t)] \leq \bigg(1 - \frac{1}{en}\bigg)^{t} $$

If we set $t = \lceil en \rceil$, then we have,

$$ \Pr[F(i,t)] \leq \bigg(1 - \frac{1}{en}\bigg)^{\lceil en \rceil} \leq \bigg(1 - \frac{1}{en}\bigg)^{en} \leq \frac{1}{e} $$

Now, if we increase $t$ by some fairly small factors, $t = \lceil en \rceil \cdot (c \ln{n})$, then we have,

$$ \color{0047AB}\Pr[F(i,t)] \leq \bigg(\frac{1}{e}\bigg)^{c\ln{n}} = \frac{1}{n^c} $$

Adjusting the constant $c$ balances the running time and probability of success. Higher accuracy can be achieved by increasing $c$, but it may increase required rounds.
Consider the event of the protocol failing after $t$ rounds, then we would like to minimize $t$ here to maximize the number of rounds it succeeds.

$$ \begin{align*} F(t) = \textrm{event that at least one of the $n$ processes fails to} \\ \textrm{access database in any of the rounds $1$ through $t$} \end{align*} $$

Therefore, $F(t)$ can be written as a union of events $F(i.t)$ for processes $i = 1, \cdots, n$.

$$ F(t) = \bigcup_{i = 1}^{n} F(i,t) $$

Using a simple union bound is typically enough to compute the probabilities of union.

$$ \Pr[F(t)] = \Pr\bigg[\bigcup_{i = 1}^{n} F(i,t)\bigg] \leq \sum_{i = 1}^{n} \Pr[F(i,t)] $$

If we set $t = 2\lceil en \rceil \cdot (c \ln{n})$, then we have,

$$ \color{0047AB}\Pr[F(t)] \leq n \cdot n^{-2} = \frac{1}{n} $$

In conclusion, we have seen that a randomized protocol can be effective in achieving contention resolution without relying on any centralized control.