Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
481 views
in Technique[技术] by (71.8m points)

gcc - C++ uniform_int_distribution always returning min() on first invocation

In at least one implementation of the standard library, the first invocation of a std::uniform_int_distribution<> does not return a random value, but rather the distribution's min value. That is, given the code:

default_random_engine engine( any_seed() );
uniform_int_distribution< int > distribution( smaller, larger );
auto x = distribution( engine );
assert( x == smaller );

...x will in fact be smaller for any values of any_seed(), smaller, or larger.

To play along at home, you can try a code sample that demonstrates this problem in gcc 4.8.1.

I trust this is not correct behavior? If it is correct behavior, why would a random distribution return this clearly non-random value?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Explanation for the observed behavior

This is how uniform_int_distribution maps the random bits to numbers if the range of possible outcomes is smaller than the range of number the rng produces:

const __uctype __uerange = __urange + 1; // __urange can be zero
const __uctype __scaling = __urngrange / __uerange;
const __uctype __past = __uerange * __scaling;
do
  __ret = __uctype(__urng()) - __urngmin;
while (__ret >= __past);
__ret /= __scaling;

where __urange is larger - smaller and __urngrange is the difference between the maximum and the minimum value the rng can return. (Code from bits/uniform_int_dist.h in libstdc++ 6.1)

In our case, the rng default_random_engine is a minstd_rand0, which yields __scaling == 195225785 for the range [0,10] you tested with. Thus, if rng() < 195225785, the distribution will return 0.

The first number a minstd_rand0 returns is

(16807 * seed) % 2147483647

(where seed == 0 gets adjusted to 1 btw). We can thus see that the first value produced by a minstd_rand0 seeded with a number smaller than 11615 will yield 0 with the uniform_int_distribution< int > distribution( 0, 10 ); you used. (mod off-by-one-errors on my part. ;) )

You mentioned the problem going away for bigger seeds: As soon as the seeds get big enough to actually make the mod operation do something, we cannot simply assign a whole range of values to the same output by division, so the results will look better.

Does that mean (libstdc++'s impl of) <random> is broken?

No. You introduced significant bias in what is supposed to be a random 32 bit seed by always choosing it small. That bias showing up in the results is not surprising or evil. For random seeds, even your minstd_rand0 will yield a fairly uniformly random first value. (Though the sequence of numbers after that will not be of great statistical quality.)

What can we do about this?

Case 1: You want random number of high statistical quality.

For that, you use a better rng like mt19937 and seed its entire state space. For the Mersenne Twister, that's 624 32-bit integers. (For reference, here is my attempt to do this properly with some helpful suggestions in the answer.)

Case 2: You really want to use those small seeds only.

We can still get decent results out of this. The problem is that pseudo random number generators commonly depend "somewhat continuously" on their seed. To ship around this, we discard enough numbers to let the initially similar sequences of output diverge. So if your seed must be small, you can initialize your rng like this:

std::mt19937 rng(smallSeed);
rng.discard(700000);

It is vital that you use a good rng like the Mersenne Twister for this. I do not know of any method to get even decent values out of a poorly seeded minstd_rand0, for example see this train-wreck. Even if seeded properly, the statistical properties of a mt19937 are superior by far.

Concerns about the large state space or slow generation you sometimes hear about are usually of no concern outside the embedded world. According to boost and cacert.at, the MT is even way faster than minstd_rand0.

You still need to do the discard trick though, even if your results look good to the naked eye without. It takes less than a millisecond on my system, and you don't seed rngs very often, so there is no reason not to.

Note that I am not able to give you a sharp estimate for the number of discards we need, I took that value from this answer, it links this paper for a rational. I don't have the time to work through that right now.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...