4
$\begingroup$

I am writing a c++ library for very large integers. (very large meaning hundreds of digits or more).

I want to add a function

$Y = \operatorname{randBig}(X)$

to generate a very large, uniformly distributed pseudo-random integer $Y \in [0, X]$.

I have at my disposal a function

$y = \operatorname{rand}(x)$

that generates uniformly distributed pseudo-random integers $y \in [0, x]$.
But the largest allowed value for $x$, called $x_{max}$, is way less than $X$:

$x_{max} \lll X$

My first attempt:

To generate the random number $Y \in [0, X]$:

  1. Assume X to be in base $b = x_{max} + 1$
  2. Generate all digits of $Y$ using $Y_i = \operatorname{rand}(X_i)$ (where $X_i$, $Y_i$ are the i-th digit of $X$ and $Y$ respectively).

This doesn't work: e.g. when $x_{max} = 9$ then randBig(100) would always return either 0 or 100, but nothing else.

My second attempt:

To generate the random number $Y \in [0, X]$:

  1. Assume X to be in base $b = x_{max} + 1$
  2. Generate all digits of $Y$ using $Y_i = \operatorname{rand}(x_{max})$ such that $Y$ has as many digits as $X$ (where $Y_i$ is the i-th digit of $Y$).
  3. If $Y \le X$ then return $Y$, otherwise go back to step 1.

This does work, but gets very slow very quickly for large bases.

My Question

Are there faster algorithms?

$\endgroup$
2
  • 2
    $\begingroup$ Your second method seems to be the most reasonable one. But better make sure to use only the most "natural" base that does not involve any additional scaling operations. On a binary computer, it may be faster to generate and combine byte- or word-sized chunks, even if $x_{max}$ could save a few calls to random(). A valid exception is for the leading digit: To generate (for readability, in base ten) bigRand(314159), do not generate six digit number from $0$ to $999999$ but only to $399999$; this is faster because you reject less. -- Also be aware of how much entropy your PRNG can sustain. $\endgroup$ Commented Jun 19 at 9:17
  • $\begingroup$ @HagenvonEitzen in my case the "natural" base is already base 2⁶⁴ $\endgroup$ Commented Jun 19 at 9:49

2 Answers 2

2
$\begingroup$

This method works as long as $b=2^p$ for some integer $p$.

Suppose you have a function rand() which generates integers uniformly between $0$ and $2^p-1$, which is equivalent to generating $p$ random bits. You can leverage this to generate, for any integer $n\le p$, a random integer uniformly between $0$ and $2^n-1$. Simply take rand() % (2^n), where % is the modulo operation.

Here is how you generate a random number between $0$ and $X-1$. First, write $X$ in base-$b$: $$ X=d_md_{m-1}\dots d_1d_0\qquad (0\le d_i\le 2^p-1,\quad 0\le i\le m) $$ Note that $d_m$ is the most significant digit, while $d_0$ is the least significant, so $X=\sum_{i=0}^{m} d_ib^i$.

  1. Find the smallest $n$ such that $2^n> d_m$.

  2. Generate $Y_m$ uniformly between $0$ and $2^{\color{red}n}-1$.

  3. For each $0\le i\le m-1$, generate $Y_i$ uniformly between $0$ and $2^p-1$.

  4. Let $Y$ be the integer whose base-$b$ digits are $Y_mY_{m-1}\dots Y_1Y_0$, so $Y=\sum_{i=0}^m Y_ib^i$. If $Y\le X$, output $Y$. Otherwise, return to step $2$.

The only difference between this method and your method is that the most significant digit $Y_m$ of $Y$ is generated slightly differently, in a way that is less wasteful. This method is fast for all $X$. The reason it is so fast is that the expected number of restarts in step 4 is less than $2$, because the probability of $Y\le X$ is at least $50\%$.


Now, let $\newcommand{\b}{\tilde b}b$ be a number which is not a power of $2$. Given the ability to generate numbers in the range $[0,b-1]$, how do you generate numbers in a large range $[0,X]$? Using the previous section, we need only a way to choose random numbers in a range $[0,2^p-1]$ for some $p$.

Here is how you efficiently convert the $[0,b-1]$ random number to a $[0,2^p-1]$ random number.

  1. Find the largest integer $p$ such that $2^p\le b$.

  2. Generate $Z$ in the range $[0,b-1]$.

  3. If $Z\le 2^p-1$, then output $Z$. Otherwise, return to step $2$.

This method is efficient, because the probability of $Z\le 2^p-1$ is at least $50\%$.

$\endgroup$
0
$\begingroup$

There is a simple optimization possible for your second approach: generate the digits from most significant digit to least significant. Let's say the first digit of $X$ in base $b$ is $x_1$, the second is $x_2$, etc:

  • Generate $Y_1$ randomly from $0$ up to and including $x_1$
  • If $Y_1<x_1$, generate all other digits without checking anything. It is guaranteed that $Y<X$.
  • If $Y_1=x_1$, then generate $Y_2$
  • If $Y_2>x_2$, start the whole procedure over (this is necessary to maintain uniform distribution)
  • If $Y_2<x_2$, generate all other digits without checking.
  • If $Y_2=x_2$, generate $Y_3$ and compare with $x_3$
  • etcetera

In this way, you still guarantee uniformness, it is very likely that you only need to restart a handful of times, and the restarts are very quick.

$\endgroup$
2
  • $\begingroup$ I thought about this, but I don´t think this produces a uniform distribution: (for readability, in base ten) bigRand(10) has this distribution: 10: 50%, 0-9: 5% each. $\endgroup$ Commented Jun 19 at 9:56
  • $\begingroup$ @J.Coenen it would have uniform distribution in that case: the first digit is $1$ with probability $0.5$, after which there is a $0.1$ probability that the second digit is $0$ and thus $10$ is picked, and a $0.9$ probability that we restart. So there is also a $5\%$ probability of generating $10$, it is just that there is a $45\%$ probability of having to restart. $\endgroup$ Commented Jul 1 at 9:20

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.