Numerical ComputationPseudorandom Number Generation
When random numbers are needed in a scientific computing application, we generally use deterministic processes which mimic the behavior of random processes. These are called pseudo-random number generators (PRNG). A PRNG takes an initial value, called the seed, and uses it to produce a sequence of numbers which are supposed to "look random". The seed determines the sequence, so you can make the random number generation in a program reproducible by providing an explicit seed. If no seed is provided, a different one will be used each time the program runs.
A simple PRNG is the linear congruential generator: fix positive integers , , and , and consider a seed . We return the sequence , where for .
Exercise
A sequence of numbers is periodic with period if is the smallest number such that for all . We say that a linear congruential generator (LCG) with is full-cyle if the generated sequence has a period . For what values of is the LCG with , , and full-cycle?
Solution. The LCG is full-cycle only if The period is when and it is if
Since the sequence of numbers produced by a PRNG is determined by the initial seed, we cannot say that the sequence of numbers is random. The pseudo part of the term pseudorandom is meant to emphasize this distinction. However, we can subject the sequence of numbers to a battery of statistical tests to check whether it can be readily distinguished from a random sequence.
For example, we can check that each number appears with approximately equal frequency. For example, a sequence purporting to sample uniformly from which begins
is probably not a good pseudorandom number generator. However, some clearly non-random sequences pass the basic frequency test:
To detect such failures, we need more advanced tests. For example, we can split up the sequence into pairs of consecutive terms and ensure that these pairs are also approximately equally represented. Since appears often and never appears in , the pair frequency test is sufficient to distinguish this sequence from a random one. We can apply the same idea with blocks of length 3 or 4 or more.
Exercise
Consider the sequence . Use Julia to show that each number from 1 to 10 appears exactly 10 times in this sequence. Also, use Julia to show that is smaller than for far more than half the values of from 1 to 50. Hint: countmap(a)
tells you how many times each element in the collection a
appears. To use this function, do using StatsBase
first.
Repeat these tests on the sequence whose th term is the th digit in the decimal representation of : reverse(digits(floor(BigInt,big(10)^99*π)))
.
Solution. Only 10 of the 50 pairs have the even-indexed term larger than the odd-indexed term:
using StatsBase a = [6] for i=1:99 push!(a,mod(2a[end],11)) end countmap(a) # every number appears 10 times sum(a[2i] < a[2i-1] for i=1:50) # returns 40
Repeating with the digits of , we find that 27 of the first hundred blocks of 2 have even-indexed digit smaller than the one before it.
A PRNG is cryptographically secure if an agent who knows the algorithm used to generate the numbers (but who does not know the value of the seed) cannot feasibly infer the the number generated based on observation of the first numbers generated.
Most PRNGs are not cryptographically secure. In other words, they hold up well under the scrutiny of general statistical tests, but not to tests which exploit knowledge of the specific algorithm used.
As a simple example of a PRNG that is not cryptographically secure, consider the digits of starting from some unknown position. This sequence does behave statistically as though it were random (as far as we know), but an adversary who knew you were using successive digits of would be able to use several values output by your PRNG to find your position and start anticipating subsequent values.