Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
485 views
in Technique[技术] by (71.8m points)

c - Generating random number within Cuda kernel in a varying range

I am trying to generate random number random numbers within the cuda kernel. I wish to generate the random numbers from uniform distribution and in the integer form, starting from 1 up to 8. The random numbers would be different for each of the threads. The range up to which random number can be generated would also vary from one thread to another. The maximum of the range in one thread might be as low as 2 or in the other thread it can be high as 8, but not higher than that. So, I am providing an example below of how I want the numbers to get generated :

In thread#1 --> maximum of the range is 2 and so the random number should be between 1 and 2
In thread#2 --> maximum of the range is 6  and so the random number should be between 1 and 6
In thread#3 --> maximum of the range is 5 and so the random number should be between 1 and 5

and so on...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

EDIT: I've edited my answer to fix some of the deficiencies pointed out in the other answers (@tudorturcu) and comments.

  1. Use CURAND to generate a uniform distribution between 0.0 and 1.0. Note: 1.0 is included and 0.0 is excluded
  2. Then multiply this by the desired range (largest value - smallest value + 0.999999).
  3. Then add the offset (+ smallest value).
  4. Then truncate to an integer.

Something like this in your device code:

int idx = threadIdx.x+blockDim.x*blockIdx.x;
// assume have already set up curand and generated state for each thread...
// assume ranges vary by thread index
float myrandf = curand_uniform(&(my_curandstate[idx]));
myrandf *= (max_rand_int[idx] - min_rand_int[idx] + 0.999999);
myrandf += min_rand_int[idx];
int myrand = (int)truncf(myrandf);

You should:

#include <math.h>

for truncf

Here's a fully worked example:

$ cat t527.cu
#include <stdio.h>
#include <curand.h>
#include <curand_kernel.h>
#include <math.h>
#include <assert.h>
#define MIN 2
#define MAX 7
#define ITER 10000000

__global__ void setup_kernel(curandState *state){

  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  curand_init(1234, idx, 0, &state[idx]);
}

__global__ void generate_kernel(curandState *my_curandstate, const unsigned int n, const unsigned *max_rand_int, const unsigned *min_rand_int,  unsigned int *result){

  int idx = threadIdx.x + blockDim.x*blockIdx.x;

  int count = 0;
  while (count < n){
    float myrandf = curand_uniform(my_curandstate+idx);
    myrandf *= (max_rand_int[idx] - min_rand_int[idx]+0.999999);
    myrandf += min_rand_int[idx];
    int myrand = (int)truncf(myrandf);

    assert(myrand <= max_rand_int[idx]);
    assert(myrand >= min_rand_int[idx]);
    result[myrand-min_rand_int[idx]]++;
    count++;}
}

int main(){

  curandState *d_state;
  cudaMalloc(&d_state, sizeof(curandState));
  unsigned *d_result, *h_result;
  unsigned *d_max_rand_int, *h_max_rand_int, *d_min_rand_int, *h_min_rand_int;
  cudaMalloc(&d_result, (MAX-MIN+1) * sizeof(unsigned));
  h_result = (unsigned *)malloc((MAX-MIN+1)*sizeof(unsigned));
  cudaMalloc(&d_max_rand_int, sizeof(unsigned));
  h_max_rand_int = (unsigned *)malloc(sizeof(unsigned));
  cudaMalloc(&d_min_rand_int, sizeof(unsigned));
  h_min_rand_int = (unsigned *)malloc(sizeof(unsigned));
  cudaMemset(d_result, 0, (MAX-MIN+1)*sizeof(unsigned));
  setup_kernel<<<1,1>>>(d_state);

  *h_max_rand_int = MAX;
  *h_min_rand_int = MIN;
  cudaMemcpy(d_max_rand_int, h_max_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
  cudaMemcpy(d_min_rand_int, h_min_rand_int, sizeof(unsigned), cudaMemcpyHostToDevice);
  generate_kernel<<<1,1>>>(d_state, ITER, d_max_rand_int, d_min_rand_int, d_result);
  cudaMemcpy(h_result, d_result, (MAX-MIN+1) * sizeof(unsigned), cudaMemcpyDeviceToHost);
  printf("Bin:    Count: 
");
  for (int i = MIN; i <= MAX; i++)
    printf("%d    %d
", i, h_result[i-MIN]);

  return 0;
}


$ nvcc -arch=sm_20 -o t527 t527.cu -lcurand
$ cuda-memcheck ./t527
========= CUDA-MEMCHECK
Bin:    Count:
2    1665496
3    1668130
4    1667644
5    1667435
6    1665026
7    1666269
========= ERROR SUMMARY: 0 errors
$

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...