Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
634 views
in Technique[技术] by (71.8m points)

multithreading - Canonical way to generate random numbers in Cython

What is the best way to generate pseudo uniform random numbers (a double in [0, 1)) that is:

  1. Cross platform (ideally with same same sample sequence)
  2. Thread safe (explicit passing of the mutated state of the prng or using a thread-local state internally)
  3. Without GIL lock
  4. Easily wrappable in Cython

There was a similar post over 3 years ago about this but a lot of the answers don't meet all criteria. For example, drand48 is POSIX-specific.

The only method I'm aware of, which seems (but not sure) to meet all some criteria is:

from libc.stdlib cimport rand, RAND_MAX

random = rand() / (RAND_MAX + 1.0)

Note @ogrisel asked the same question about 3 years ago.

Edit

Calling rand is not thread safe. Thanks for pointing that out @DavidW.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Big pre-answer caveat: this answer recommends using C++ because the question specifically asks for a solution that runs without the GIL. If you don't have this requirement (and you probably don't...) then Numpy is the simplest and easiest solution. Provided that you're generating large amounts of numbers at a time you will find Numpy perfectly quick. Don't be misled into a complicated exercise in wrapping C++ because someone asked for a no-gil solution.


Original answer:

I think the easiest way to do this is to use the C++11 standard library which provides nice encapsulated random number generators and ways to use them. This is of course not the only options, and you could wrap pretty much any suitable C/C++ library (one good option might be to use whatever library numpy uses, since that's most likely already installed).

My general advice is to only wrap the bits you need and not bother with the full hierarchy and all the optional template parameters. By way of example I've shown one of the default generators, fed into a uniform float distribution.

# distutils: language = c++
# distutils: extra_compile_args = -std=c++11

cdef extern from "<random>" namespace "std":
    cdef cppclass mt19937:
        mt19937() # we need to define this constructor to stack allocate classes in Cython
        mt19937(unsigned int seed) # not worrying about matching the exact int type for seed
    
    cdef cppclass uniform_real_distribution[T]:
        uniform_real_distribution()
        uniform_real_distribution(T a, T b)
        T operator()(mt19937 gen) # ignore the possibility of using other classes for "gen"
        
def test():
    cdef:
        mt19937 gen = mt19937(5)
        uniform_real_distribution[double] dist = uniform_real_distribution[double](0.0,1.0)
    return dist(gen)

(The -std=c++11 at the start is for GCC. For other compilers you may need to tweak this. Increasingly c++11 is a default anyway, so you can drop it)

With reference to your criteria:

  1. Cross platform on anything that supports C++. I believe the sequence should be specified so it's repeatable.
  2. Thread safe, since the state is stored entirely within the mt19937 object (each thread should have its own mt19937).
  3. No GIL - it's C++, with no Python parts
  4. Reasonably easy.

Edit: about using discrete_distribution.

This is a bit harder because the constructors for discrete_distribution are less obvious how to wrap (they involve iterators). I think the easiest thing to do is to go via a C++ vector since support for that is built into Cython and it is readily convertable to/from a Python list

# use Cython's built in wrapping of std::vector
from libcpp.vector cimport vector

cdef extern from "<random>" namespace "std":
    # mt19937 as before
    
    cdef cppclass discrete_distribution[T]:
        discrete_distribution()
        # The following constructor is really a more generic template class
        # but tell Cython it only accepts vector iterators
        discrete_distribution(vector.iterator first, vector.iterator last)
        T operator()(mt19937 gen)

# an example function
def test2():
    cdef:
        mt19937 gen = mt19937(5)
        vector[double] values = [1,3,3,1] # autoconvert vector from Python list
        discrete_distribution[int] dd = discrete_distribution[int](values.begin(),values.end())
    return dd(gen)

Obviously that's a bit more involved than the uniform distribution, but it's not impossibly complicated (and the nasty bits could be hidden inside a Cython function).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...