Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
944 views
in Technique[技术] by (71.8m points)

matlab - Sending data to workers

I am trying to create a piece of parallel code to speed up the processing of a very large (couple of hundred million rows) array. In order to parallelise this, I chopped my data into 8 (my number of cores) pieces and tried sending each worker 1 piece. Looking at my RAM usage however, it seems each piece is send to each worker, effectively multiplying my RAM usage by 8. A minimum working example:

A = 1:16;
for ii = 1:8
    data{ii} = A(2*ii-1:2*ii);
end

Now, when I send this data to workers using parfor it seems to send the full cell instead of just the desired piece:

output = cell(1,8);
parfor ii = 1:8
    output{ii} = data{ii};
end

I actually use some function within the parfor loop, but this illustrates the case. Does MATLAB actually send the full cell data to each worker, and if so, how to make it send only the desired piece?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In my personal experience, I found that using parfeval is better regarding memory usage than parfor. In addition, your problem seems to be more breakable, so you can use parfeval for submitting more smaller jobs to MATLAB workers.

Let's say that you have workerCnt MATLAB workers to which you are gonna handle jobCnt jobs. Let data be a cell array of size jobCnt x 1, and each of its elements corresponds to a data input for function getOutput which does the analysis on data. The results are then stored in cell array output of size jobCnt x 1.

in the following code, jobs are assigned in the first for loop and the results are retrieved in the second while loop. The boolean variable doneJobs indicates which job is done.

poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
    future(jobNo) = parfeval(poolObj,@getOutput,...
        nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
    [idx,result] = fetchnext(future);
    output{idx} = result;
    doneJobs(idx) = true;
end

Also, you can take this approach one step further if you want to save up more memory. What you could do is that after fetching the results of a done job, you can delete the corresponding member of future. The reason is that this object stores all the input and output data of getOutput function which probably is going to be huge. But you need to be careful, as deleting members of future results index shift.

The following is the code I wrote for this porpuse.

poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
    future(jobNo) = parfeval(poolObj,@getOutput,...
        nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
    [idx,result] = fetchnext(future);
    furure(idx) = []; % remove the done future object
    oldIdx = 0;
    % find the index offset and correct index accordingly
    while oldIdx ~= idx
        doneJobsInIdxRange = sum(doneJobs((oldIdx + 1):idx));
        oldIdx = idx
        idx = idx + doneJobsInIdxRange;
    end
    output{idx} = result;
    doneJobs(idx) = true;
end

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...