Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
155 views
in Technique[技术] by (71.8m points)

How I can understand code of two distributions R?

I have this code:

df1<-tibble(x = sort(rnorm(1e5)),
       cumulative = cumsum(abs(10-x)/sum(abs(10-x)))/2.5)
df2<-tibble(x1 = sort(rbinom(1e5,1e5, 0.001)/1e5))

which was posted in my previous question. After some research I still can't understand several things and I will be so pleased if someone will explain it for me:

  1. Which params we have at df1 distribution?
  2. Which params we have at df2 distribution? Why we have to divide on 1e5 and on which formula it is based?
  3. Why we have to use 10 at such scope - (10-x) and 2.5 also.

I will be happy if someone explain this questions for me.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think the answer to your previous question is wrong and misleading, although, to be fair, you did not ask the question very clearly.

I think what you are perhaps trying to do is compare the binomial distribution to the Normal approximation to it. The binomial is the number of successes you get if you do something N times, and the chance of each being a success is p. The mean of this is Np, and the standard deviation is sqrt(Np(1-p)), which can be used to approximate it with a Normal distribution.

One way to compare them using ggplot would be like this...

library(tidyverse)

trials <- 100   #i.e. N in the explanation above
prob <- 0.1     #i.e. p in the explanation above
sims <- 100000  #the number of simulations you want (1e5 in your previous question)

df <- tibble(n = 1:sims,
             normal = sort(rnorm(sims,                               #no of variates
                                 trials * prob,                      #mean
                                 sqrt(trials * prob * (1-prob)))),   #standard deviation
             binomial = sort(rbinom(sims, 
                                    trials, 
                                    prob))) 

Then, to compare the (discrete) histogram of the binomial distribution (in red) with the (continuous) density of the Normal approximation (in blue), you can do

df %>% ggplot() + 
  geom_density(aes(x = normal), 
               alpha = 0.5, 
               fill = "blue") +
  geom_histogram(aes(x = binomial, 
                     y = stat(density)), #normalises scale to sum to 1 
                 alpha = 0.5, 
                 fill = "red", 
                 binwidth = 1)

enter image description here

And to compare the cumulative distributions (taking advantage of the fact that we have sorted the variates in our dataframe)...

df %>% ggplot(aes(y = n/sims)) + 
  geom_line(aes(x = normal), 
            colour = "blue") +
  geom_line(aes(x = binomial), 
            colour = "red")

enter image description here

I hope this helps!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...