Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
386 views
in Technique[技术] by (71.8m points)

php - Generate Random Numbers with Probabilistic Distribution

Ok, so here's my problem. We are looking at purchasing a data set from a company to augment our existing data set. For the purposes of this question, let's say that this data set ranks places with an organic number (meaning that the number assigned to one place has no bearing on the number assigned to another). The technical range is 0 to infinity, but from sample sets that I've seen, it's 0 to 70. Based on the sample, it's most definitely not a uniform distribution (out of 10,000 there are maybe 5 places with a score over 40, 50 with a score over 10, and 1000 with a score over 1). Before we decide to purchase this set, we would like to simulate it so that we can see how useful it may be.

So, to simulate it, I've been thinking about generating a random number for each place (about 150,000 random numbers). But, I also want to keep to the spirit of the data, and keep the distribution relatively the same (or at least reasonably close). I've been racking my brain all day trying to think of a way to do it, and have come up empty.

One thought I had was to square the random number (between 0 and sqrt(70)). But that would favor both less than 1 and larger numbers.

I'm thinking that he real distribution should be hyperbolic in the first quadrant... I'm just blanking on how to turn a linear, even distribution of random numbers into a hyperbolic distribution (If hyperbolic is even what I want in the first place).

Any thoughts?

So, to sum, here's the distribution I would like (approximately):

  • 40 - 70: 0.02% - 0.05%
  • 10 - 40: 0.5% - 1%
  • 1 - 10: 10% - 20%
  • 0 - 1 : Remainder (78.95% - 89.48%)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Look at distributions used in reliability analysis - they tend to have these long tails. A relatively simply possibility is the Weibull distribution with P(X>x)=exp[-(x/b)^a].

Fitting your values as P(X>1)=0.1 and P(X>10)=0.005, I get a=0.36 and b=0.1. This would imply that P(X>40)*10000=1.6, which is a bit too low, but P(X>70)*10000=0.2 which is reasonable.

EDIT Oh, and to generate a Weibull-distributed random variable from a uniform(0,1) value U, just calculate b*[-log(1-u)]^(1/a). This is the inverse function of 1-P(X>x) in case I miscalculated something.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...