Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
336 views
in Technique[技术] by (71.8m points)

r - Any suggestions for how I can plot mixEM type data using ggplot2

I have a sample of 1m records obtained from my original data. (For your reference, you may use this dummy data that may generate approximately similar distribution

b <- data.frame(matrix(rnorm(2000000, mean=c(8,17), sd=2)))
c <- b[sample(nrow(b), 1000000), ]

) I believed the histogram to be a mixture of two log-normal distributions and I tried to fit the summed distributions using EM algorithm using the following code:

install.packages("mixtools")
lib(mixtools)
#line below returns EM output of type mixEM[] for mixture of normal distributions
c1 <- normalmixEM(c, lambda=NULL, mu=NULL, sigma=NULL) 
plot(c1, density=TRUE)

The first plot is a log-likelihood plot and the second (if you hit return again), gives similar to the following density curves:

Mixture model density curves

As I mentioned c1 is of type mixEM[] and plot() function can accommodate that. I want to fill the density curves with colors. This is easy to do using ggplot2() but ggplot2() does not support data of type mixEM[] and throws this message:

ggplot doesn't know how to deal with data of class mixEM

Is there any other approach I can take for this problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Look at the structure of the returned object (this should be documented in the help):

> # simple mixture of normals:
> x=c(rnorm(10000,8,2),rnorm(10000,17,4))
> xMix = normalmixEM(x, lambda=NULL, mu=NULL, sigma=NULL)

Now what:

> str(xMix)
List of 9
 $ x         : num [1:20000] 6.18 9.92 9.07 8.84 9.93 ...
 $ lambda    : num [1:2] 0.502 0.498
 $ mu        : num [1:2] 7.99 17.05
 $ sigma     : num [1:2] 2.03 4.02
 $ loglik    : num -59877

The lambda, mu, and sigma components define the returned normal densities. You can plot these in ggplot using qplot and stat_function. But first make a function that returns scaled normal densities:

sdnorm =
function(x, mean=0, sd=1, lambda=1){lambda*dnorm(x, mean=mean, sd=sd)}

Then:

qplot(x,geom="density") + stat_function(fun=sdnorm,args=list(mean=xMix$mu[1],sd=xMix$sigma[1], lambda=xMix$lambda[1]),fill="blue",geom="polygon")  + stat_function(fun=sdnorm,args=list(mean=xMix$mu[2],sd=xMix$sigma[2], lambda=xMix$lambda[2]),fill="#FF0000",geom="polygon") 

enter image description here

Or whatever ggplot skills you have. Transparent colours on the densities might be nice.

ggplot(data.frame(x=x)) + 
 geom_histogram(aes(x=x,y=..density..),fill="white",color="black") +
 stat_function(fun=sdnorm,
    args=list(mean=xMix$mu[2],
             sd=xMix$sigma[2],
             lambda=xMix$lambda[2]),
             fill="#FF000080",geom="polygon") +
 stat_function(fun=sdnorm,
    args=list(mean=xMix$mu[1],
             sd=xMix$sigma[1],
             lambda=xMix$lambda[1]),
             fill="#00FF0080",geom="polygon")

producing:

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...