Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
571 views
in Technique[技术] by (71.8m points)

r - Simulating correlated Bernoulli data

I want to simulate 100 data with 5 columns. I want to get a correlation of 0.5 between the columns. To complete it, I have done the following action

F1 <- matrix( c(1, .5, .5, .5,.5,
                   .5, 1, .5, .5,.5,
                   .5, .5, 1, .5,.5,
                   .5, .5, .5, 1,.5,
                   .5, .5, .5, .5,1
), 5,5)

To simulate the intended data frame, I have done this, but it does not work properly.

 df2 <- as.data.frame (rbinom(100, 1,.5),ncol(5), F1)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm surprised this isn't a duplicate (this question refers specifically to non-binary responses, i.e. binomial with N>1). The bindata package does what you want.

library(bindata)
## set up correlation matrix (compound-symmetric with rho=0.5)
m <- matrix(0.5,5,5)
diag(m) <- 1

Simulate with a mean of 0.5 (as in your example):

set.seed(101)
## this simulates 10 rather than 100 realizations
## (I didn't read your question carefully enough)
## but it's easy to change
r <- rmvbin(n=10, margprob=rep(0.5,5), bincorr=m)
round(cor(r),2)

Results

 1.00 0.22  0.80  0.05 0.22
 0.22 1.00  0.00  0.65 1.00
 0.80 0.00  1.00 -0.09 0.00
 0.05 0.65 -0.09  1.00 0.65
 0.22 1.00  0.00  0.65 1.00
  • this looks wrong - the correlations aren't exactly 0.5 - but on average they will be (when I sampled 10,000 vectors rather than 10, the values ranged from about 0.48 to 0.51). Equivalently, if you simulated many samples of 10 and computed the correlation matrix for each, you should find that the expected (average) correlation matrix is correct.
  • simulating values with correlation exactly equal to the specified value is much harder (and not necessarily what you want to do anyway, depending on the application)
  • note that there will be limitations about what mean vectors and correlation matrices are feasible. For example, the off-diagonal elements of an n-by-n compound-symmetric (equal-correlation) matrix can't be less than -1/(n-1). Similarly, there may be limits on what correlations are possible for a given set of means (this may be discussed in the technical reference, I haven't checked).

The reference for this method is

Leisch, Friedrich and Weingessel, Andreas and Hornik, Kurt (1998) On the generation of correlated artificial binary data. Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science", 13. SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, Vienna. https://epub.wu.ac.at/286/


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...