Inverse transform sampling

Inverse transform sampling or inversion sampling is a method to sample numbers at random from any probability distribution, given its cumulative distribution function.
Lets understand this with an example: You are given an random variable X which is exponential distributed. More formally,
\begin{align} f(x;\lambda) &= { \lambda * exp(-\lambda * x) \space \space x >= 0} \\ &= { 0 \space when \space x \space < \space 0} \end{align} And your goal is to sample from X.
Given this PDF, we can easily come up with a CDF (probability that X will take a value less than or equal to x).
Because it is a probability density, the sum of the probabilities of all possible values for $x$ must be $1$. The CDF gives us the area under the PDF curve at and to the left of a given $x$ value .
What if we map this CDF to a point x in X? It is very easy to sample a point in (0, 1) and if we have the inverse CDF mapping, then we can easily sample points. For the above example, we have

\begin{align} CDF(x) &= { 1-e^{-\lambda x}}\\ x &= \frac{-1}{\lambda}\log(1 - CDF(x))\\ \end{align}

We can now use this inverse CDF function for sampling points in X as shown in the code below.

Consider another Problem: Sample points uniformly from a unit circle.
This problem is tricky. The reason is the non-uniform distribution of points within a unit circle. Let's start with simple solution and check if it works.

Sample a point uniformly at random $p ~ (0, 1)$
$r = p \space and \space \theta = \space2 * \space\pi *\space p$
Then $x = rcos\theta\space,\space rsin\theta$

Here is a Matlab/Octave code which performs this.

As it is evident from this plot, there are more points towards the center. Why is this case?
This is because as we move further from the center, we need more points to maintain the same density. In other words, points closer to the center have smaller distances between them.
How do we rectify this?

We need to sample points proportional to the area. Lets start with the $CDF(x) = \frac{\pi * x * x}{ \pi * 1 * 1 }$. In the other words, given a radius x, ratio of all the points lying within the circle with x to the area of unit circle.
Therefore, $CDF(x) = x \space * \space x $
In other words, $x = \sqrt{CDF(x)}$
With this correction, our steps become:

Sample a point uniformly at $random \space p \sim (0, 1) $
$r = \sqrt{p} \space and \space \theta = 2 \space* \space\pi \space*\space p $
Then $x = rcos\theta \space,\space rsin\theta$

The below code with this small change now looks like:

And now the plot shows a uniform sample of points. Below, we will solve this very problem using an entirely different technique called the Rejection Sampling.

Essentially, inverse transform sampling consists of following steps:

Given a CDF of random variable X, compute $CDF^{-1}(X)$
Sample a point in $[0, 1]$
Using $CDF^{-1}(X)$, compute $x_i$ and include that in the sample.

Pros
Each point is accepted as a sample unlike rejection sampling discussed below.

Cons
The method assumes the existence of an inverse CDF function.

Rejection Sampling

As we saw above, inverse transform sampling requires a CDF. What is we do not have a CDF or the underlying distribution is very complex? Lets start with the same example as above.

Problem: Sample points uniformly from a unit circle.

Imagine a unit square encompassing the circle.
Pick a point P in this unit square uniformly at random. This is easy, you can pick an $x \sim Rand(0, 1)$ and then $y = x.$
Now check if P lies within the circle or outside it. This is not hard, we can simply check if $x_i^2 + y_i^2 <= 1$
If the point lies outside the circle, reject the point. Else include it in our sample.

Below Matlab/Octave code shows this in action.

Rejection Sampling can be described as follows:

Let say you have a complex distribution $f$ which is difficult to sample from.
Propose a distribution q such that it encompasses $f$ completely and is easy to sample from
Sample x uniformly at random from the distribution q.
Retain this point if it lies in $f$ or else reject it.

Cons of Rejection Sampling:

It might be difficult to come up with a good proposal distribution.
Lots of points get rejected unlike inversion sampling, so it might a long time to get enough number of samples.

References:

I will be updating this blog with other sampling methods like Importance sampling, MCMC in a short while. Stay tuned!

Goal : Choose a sample of $k$ items from a list $S$ containing $n$ items, where $n$ is either unknown or very large. For instance, a stream of numbers coming in.

Lets start with the procedure and then prove why this correct.

Keep the first k items in an array (we call this a reservoir)
When the $i^{th}$ item arrives, $i > k$
1. With the probability $k/i$, keep the new item, discarding an old one selected uniformly at random from the reservoir.
2. With probability $1-k/i$, reject this new item.

Now, lets take an example and work out this math, Lets take k = 10

For the first $10$ items, add them to the reservoir with probability $1$
When the $11^{th}$ item comes, it is
- added to the reservoir with probability $10/11$
- Now, what is the probability of an item $I$ which was already present in the reservoir when the $11^{th}$ item came ?

$p_{required} = $ (the 11th element is chosen and it replaces an element other than $I$ ) or (the 11th element is rejected) \begin{align} &= 10/11 * 9/10 +(1-10/11) \\ &= 9/11+1/11 \\ &= 10/11 \end{align}

So, all the elements in the reservoir stay with the probability of $10/11$ Lets work out what happens when the $12^{th}$ element comes.

It is added to reservoir with probability $10/12$
What is the probability that element $I$ (which was present in the reservoir)" will stay ?
- $= 10/11 * ( 10/12 * 9/10 + (1-10/12) )$
- $= 10/11 ( 9/12 + 2/12) = 10/11 * 11/12 = 10/12 $

The $10/11$ factor in the above equation comes from the previous round. Thus, all elements in the reservoir have a probability of $10/12$.

How do we code this up ?
Input: Elements S[1..n], we want to sample k elements where $k << n$

Allocate reservoir $R$ of size $k$
for $i$ = $1$ to $k$
$R[i] = S[i]$
For $i = k + 1$ to n
$j$ = $random(1,i)$ // inclusive
if $j <= k$
$R[j] = S[i]$

Lets see how this code is equivalent to what we computed earlier, in an inductive way.

Lets assume after round $i - 1$, each element in the reservoir stays with the probability $k/(i-1)$
For round i, we have
1. New element stays with probability $k/i$
2. Old element stays with probability :
  $= \frac{k}{i - 1} (\frac{k}{i} * \frac{k - 1}{k} + (1 - \frac{k}{i}))$
  $= \frac{k}{i-1}\frac{i-1}{i} $
  $= \frac{k}{i} $ as required

Reservoir sampling with random sort

Assign a random number as a key to each item
Sort these items and maintain k items with minimum key values

This can be easily implemented using a heap.

Distributed Reservoir Sampling

What if we have huge amounts of data? It is desirable to distributed sampling tasks among many machines in parallel
A Simple approach: Distributed sort with a random key like above and pick top k elements .

But this is too costly.

Instead, we

Distribute data among m machines
Each machine $c$ does its own sampling using a random key and produces a sample of size <= $k$ items
Collect all samples $n^{'} <= mk$
Sample $k$ items from these $n^{'}$ items using the same random key

If we want to associate a weight $W_i$ with each item, then the random key can be scaled as $r^{1/w_i}$. That is, higher the weight bigger the key.

SystemAntics: DNNs, ML, Systems and Code...

Tuesday, December 25, 2018

Sampling methods

Inverse transform sampling

Rejection Sampling

Friday, December 21, 2018

Reservoir Sampling