Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
643 views
in Technique[技术] by (71.8m points)

statistics - Efficient calculation of matrix cumulative standard deviation in r

I recently posted this question on the r-help mailing list but got no answers, so I thought I would post it here as well and see if there were any suggestions.

I am trying to calculate the cumulative standard deviation of a matrix. I want a function that accepts a matrix and returns a matrix of the same size where output cell (i,j) is set to the standard deviation of input column j between rows 1 and i. NAs should be ignored, unless cell (i,j) of the input matrix itself is NA, in which case cell (i,j) of the output matrix should also be NA.

I could not find a built-in function, so I implemented the following code. Unfortunately, this uses a loop that ends up being somewhat slow for large matrices. Is there a faster built-in function or can someone suggest a better approach?

cumsd <- function(mat)
{
    retval <- mat*NA
    for (i in 2:nrow(mat)) retval[i,] <- sd(mat[1:i,], na.rm=T)
    retval[is.na(mat)] <- NA
    retval
}

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could use cumsum to compute necessary sums from direct formulas for variance/sd to vectorized operations on matrix:

cumsd_mod <- function(mat) {
    cum_var <- function(x) {
        ind_na <- !is.na(x)
        nn <- cumsum(ind_na)
        x[!ind_na] <- 0
        cumsum(x^2) / (nn-1) - (cumsum(x))^2/(nn-1)/nn
    }
    v <- sqrt(apply(mat,2,cum_var))
    v[is.na(mat) | is.infinite(v)] <- NA
    v
}

just for comparison:

set.seed(2765374)
X <- matrix(rnorm(1000),100,10)
X[cbind(1:10,1:10)] <- NA # to have some NA's

all.equal(cumsd(X),cumsd_mod(X))
# [1] TRUE

And about timing:

X <- matrix(rnorm(100000),1000,100)
system.time(cumsd(X))
# user  system elapsed 
# 7.94    0.00    7.97 
system.time(cumsd_mod(X))
# user  system elapsed 
# 0.03    0.00    0.03 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...