Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
612 views
in Technique[技术] by (71.8m points)

performance - What is the fastest way to perform arithmetic operations on each element of a cell array?

Say I want to multiply each element of a cell array A with a coefficent k. I can do that by:

A = cellfun(@(x) k*x, A, 'UniformOutput', false)

But this is extremely slow. Is there a faster and better way? The cell array elements are variable length vectors, so cell2num doesn't apply.

Edit: Based on fpe's recommendation of a for loop, here is an example benchmark. Starting with this data

A = arrayfun(@(n) rand(n,1), randi(5,1000,1000), 'UniformOutput',false);

The cellfuncall above takes 9.45 seconds, while a for loop:

A2 = cell(size(A));
for i = 1:size(A,1), for j = 1:size(A,2), A2{i,j} = A{i,j}*k; end; end
A = A2;

takes 1.67 seconds, which is a significant improvement. I'd still prefer something a few orders of magnitude faster. (I also don't understand why the Matlab interpreter is unable to make the cellfun call as fast as the for loop. They are semantically identical.)

Edit 2: Amro's suggestion to make one single for loop is significantly faster:

for i = 1:numel(A), A{i} = A{i}*k; end

takes 1.11 seconds, and if I run pack prior to it to align the memory just 0.88 seconds.

Implementing a MEX function to do this is actually not much better: 0.73 seconds, (0.53 seconds after pack), which indicates that allocating many small matrices is slow in Matlab.

#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
    if (nrhs != 2)
        mexErrMsgTxt("need 2 arguments (Cell, Coefficient)");

    mwSize const* size = mxGetDimensions(prhs[0]);
    int N = mxGetNumberOfDimensions(prhs[0]);

    if (mxGetNumberOfElements(prhs[1]) != 1)
        mexErrMsgTxt("second argument to multcell must be a scalar");

    double coefficient = *mxGetPr(prhs[1]);

    plhs[0] = mxCreateCellArray(N, size);

    int M = mxGetNumberOfElements(prhs[0]);

    for (int i = 0; i < M; i++) {
        mxArray *r = mxGetCell(prhs[0], i);
        mxArray *l = mxCreateNumericArray(mxGetNumberOfDimensions(r),
                                          mxGetDimensions(r),
                                          mxDOUBLE_CLASS,
                                          mxREAL);
        double *rp = mxGetPr(r);
        double *lp = mxGetPr(l);
        int num_elements = mxGetNumberOfElements(r);
        for (int i = 0; i < num_elements; i++)
            lp[i] = rp[i] * coefficient;
        mxSetCell(plhs[0], i, l);
    }
}

Cheating a bit, however, and implementing a MEX function that actually edits the memory in place seems to be the only way to get reasonable performance out the operation: 0.030 seconds. This uses the undocumented mxUnshareArray as suggested by Amro.

#include "mex.h"

extern "C" bool mxUnshareArray(mxArray *array_ptr, bool noDeepCopy);

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
    if (nrhs != 2)
        mexErrMsgTxt("need 2 arguments (Cell, Coefficient)");

    mwSize const* size = mxGetDimensions(prhs[0]);
    int N = mxGetNumberOfDimensions(prhs[0]);

    if (mxGetNumberOfElements(prhs[1]) != 1)
        mexErrMsgTxt("second argument to multcell must be a scalar");

    double coefficient = *mxGetPr(prhs[1]);

    mxUnshareArray(const_cast<mxArray *>(prhs[0]), false);
    plhs[0] = const_cast<mxArray *>(prhs[0]);

    int M = mxGetNumberOfElements(prhs[0]);

    for (int i = 0; i < M; i++) {
        mxArray *r = mxGetCell(prhs[0], i);
        double *rp = mxGetPr(r);
        int num_elements = mxGetNumberOfElements(r);
        for (int i = 0; i < num_elements; i++)
            rp[i] = rp[i] * coefficient;
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Not exactly an answer, but here is a way to see the affect of JIT compiler and accelerator in both approaches (cellfun vs. for-loop):

feature('jit', 'off'); feature('accel', 'off');
tic, A = cellfun(@(x) k*x, A, 'UniformOutput', false); toc
tic, for i=1:numel(A), A{i} = A{i}*k; end, toc

feature('jit', 'on'); feature('accel', 'on');
tic, A = cellfun(@(x) k*x, A, 'UniformOutput', false); toc
tic, for i=1:numel(A), A{i} = A{i}*k; end, toc

I get the following

Elapsed time is 25.913995 seconds.
Elapsed time is 13.050288 seconds.

vs.

Elapsed time is 10.053347 seconds.
Elapsed time is 1.978974 seconds.

with optimization turned on in the second.

By the way, parallel parfor performed much worse (at least on my local test machine with a pool size of 2 processes).

Seeing the results you posted, MEX-function is the way to go :)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...