Distribution of the Modulus of a Random Variable

We know that the distribution of the sum for n random uniform variables approximates a normal distribution as n gets larger. For right now, we’re referring to the discrete uniform distribution, and not the continuous one.

What is the distribution of Y mod b, for any base b?

Alright, rewind. Earlier this year, I had a technical interview and the following was an actual interview question: mimic rand7, given only rand5. For those of us lucky enough to work with a statistical language like R, we’ve got:

1
2
> rand7 = sample(1:7, 1)
> rand5 = sample(1:5, 1)

Boom. Problem done. Well, not exactly. The goal of this interview question is to figure out a way to expand the range of rand5 while preserving the uniformity of each output value in rand7. I think it’s only a disarming question in an interview because we don’t often work with the intricacies of random number generation in our coursework. Even at a school like CMU, we’re usually dealing with the analysis of the distribution, not its transformations. The most important task in this problem is to ensure that all 7 outputs have exactly equal probability of being output, even though you’re only being given enough “randomness” for 5 outputs. In more modern parlance, we have to pull some entropy out of our ass.

I’ve researched a couple solutions to this problem, and they range from simple to esoteric. One of the more complex options is to use our smaller RNG to uniformly build a larger generater than we actually need, then sample back down to the range we want. The second step, to sample back down, is guaranteed to be uniform as long as we only sample within a particular range, seen below.

Let’s generalize: suppose our problem was “given rand2, compute rand3.”

Really, this is the base case of our interview question, if you will.

And the beautiful thing is that if we consider the outputs of rand2 as the paths along a binary tree, we can make a pretty powerful metaphor for this problem. So let’s say we take the left branch of our binary tree if rand2 == 0 and the right branch if rand2 == 1. Every time we decide to call rand2, we have to add a level to the tree. At the bottom, we label each leaf node with a number: 1, 2, or 3. These labels are our RNG output.

If our tree has $j$ levels at the end, then we’ll make $j$ calls to rand2, take those paths, then return the label at the bottom-most leaf.

Without even considering how to decide which leaf nodes should be labeled what, we note that the probability of reaching any of the nodes labeled 3 would be to sum up the probabilities along the path to reach each such node labeled 3. Since the probability to reach one particular node is (where $j$ is the level in which the node is found), then the total probability to reach any of our nodes labeled 3 would be the sum of that. This makes the form of our probability statement something like

The question is: is there a series of reciprocal powers such that ?

And the answer is no, because 2 and 3 are relatively prime. Well that’s simple. Math is cool.

So what do we do?

We need to use our smaller RNG to generate a larger range than either RNG (think kind of like a least-common-multiple) and then reduce back down to the target RNG. Generation of the larger range can be done easily enough by “building” a large number, in a very literal sense. Reduction can also be done easily with repeated sampling.

Let’s go back to our 5 & 7 case. What we’ll do is “create” a random base-5 number with num = 5 * (rand5() - 1) + (rand5() - 1) and as long as ‘num’ is less than 21 (in base 10) (41 in base 5), then its modulus by 7 will be uniform, because it is a multiple of 7 (in this case 3 * 7). You could realistically use any multiple of 7, I think. I’d really like it if somebody could back me up on that claim.

Here’s the solution my interviewer emphatically claimed was the only solution:

Solution by Gayle Laakmanlink
1
2
3
4
5
6
public static int rand7() {
  while (true) {
    int num = 5 * (rand5() - 1) + (rand5() - 1);
    if (num < 21) return (num % 7 + 1);
  }
}

Let’s get a couple criticisms out of the way.

  • The solution could run forever, but converges, so you’ll get something eventually.
  • It’s not intuitive at all.

It certainly wasn’t the first thing I thought of, yet it’s the “by-the-book” solution.

So let’s look at a different solution for this problem, and one which bothered me the moment I saw it work. You’d think that we want things to work, but I’m surprised because I feel like it shouldn’t work. It feels like this solution is taking the easy way out, and then to top it all off, there’s no Google results that suggest it’s even viable.

We’ll take seven random samples of 1:5, add them up, and take their modulus.

1
2
3
4
5
6
7
> rand5 = function() sample(1:5, 1)
> rand7 = function() {
    x = replicate(7, rand5())
    Y = sum(x)
    return(Y %% 7)
  }
> plot(density(replicate(1e6, rand7())), main = "Uniform Density")

Wait… What the hell?

It looks totally uniform!It looks totally uniform!

Let’s look at it this way, the sum for n random uniform variables approximates a normal distribution as n gets larger. Everybody agrees, yes?

Then right before that modulus, our hero should be something really close to a normal distribution with mean 21 and standard deviation . How close?

Pretty close.Pretty close.

The mean is 21 because that’s by linearity of expectation and it just so happens that for a uniform random variable , so . The variance of a uniform distribution is (the really weird) where is the number of discrete possibilities. Since we’re looking at the discrete uniform distribution from 1 to 5, that gives us 5 total choices and . Then by linearity . Thus, our standard deviation is .

So right before that modulus, we’re definitely working with some value that has been randomly drawn from . The normal distribution, a unimodal density curve, indicates that we should expect to get the sum 21 more often than any other number. Ok, it’s not a huge amount more, just about 4000 times more than its neighbors 20 and 22.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
> tmp = replicate(1e6, sum(replicate(7, rand5())))
> cbind(table(tmp), 7:35 %% 7)
     [,1] [,2]
7       9    0
8      79    1
9     345    2
10   1044    3
11   2638    4
12   5961    5
13  11351    6
14  19368    0
15  30733    1
16  45068    2
17  61191    3
18  77753    4
19  91031    5
20 100546    6
21 104512    0
22 100739    1
23  91845    2
24  77630    3
25  61064    4
26  45485    5
27  30779    6
28  19544    0
29  11350    1
30   5680    2
31   2633    3
32   1144    4
33    370    5
34     91    6
35     17    0

Each row is one of our potential sum values of Y, the first column is the number of times that sum appeared in a simulation of one million iterations, and the second column is the corresponding modulus of that sum. Now, I figured the peak in the center should inflate the number of instances of 21 %% 7 so that it is the most-common-modulus value. As it turns out, while 21 (and 21 mod 7 = 0) is the most common, 7 mod 7 = 0 and 35 mod 7 = 0 are the two least common. And in fact, it occurs in such perfect proportion that this cascades all along the normal distribution so that every modulus remainder has approximately equal number of sum values which are congruent modulo 7.

The normal distribution “folds” on itself in such a perfectly uniform way.

It’s like it was destined to be this way.

I couldn’t shake the feeling that the “proper” answer would be a truncated normal distribution. On the other hand, modulus can only return one of the integer values up to its base, and with 29 possibilities, we would run out of integers four times over. My presumption, again, was that we’d see the most modulus values at 21 mod 7 = 0. But where would 20 and 22 go? Well those modulus values fall into -1 (6 mod 7) and 1 mod 7. Then the next set would be at 5 mod 7 & 2 mod 7, then 4 & 3… then 3 & 4. The wide tail support of our normal distribution folds over to smooth everything out.

So I did the only logical thing and turned to Cosma Shalizi to ask him why the modulus of a non-uniform distribution would resolve to a uniform distribution in this way, emphasizing the way that the tails fold over each other.

Equidistribution

Cosma confirmed that it is more a factor of the modulus than the normal distribution.

Here, roughly speaking, is the trick: take a non-negative but discrete random variable , with probability mass function . We want to know the probability that Y mod 7 = k, for each . Fix . Then

Now do this for all the other possible values of Y mod 7:

Assume is about the same as , provided k is small, specifically if . If we think of and let n get very big, this makes sense, the typical scale over which the distribution is going to change is going to get larger and larger, so that a difference of just 7 becomes negligible. But now we can say that

  • are all about equal
  • are all about equal, etc.

so

Thus, we find that the distribution of Y mod k is uniform, regardless of whatever distribution itself is drawn from.

Comments