Slow Execution of Parfor Loops due to Communication Overhead: Load static data into worker workspace memory?

6 views (last 30 days)
For my research, I require near realtime execution of a large number (>1000) of matrix-vector multiplications of the form A*x with A a medium scale matrix (e.g. 150x150). These matrices are constructed in an extremely expensive operation (takes hours to complete), and saved in a static data structure (MatSet in the example below). This static data structure is used by all workers, and is not modified after creation.
When I run the code, which is equivalent to the code below, I find that the PARFOR loop is more than 10 times slower than the FOR loop in Matlab 2010b. This is caused by a constant transfer of data (MatSet in this case) between workers. In my case, however, this data transfer is completely unnecessary as MatSet is a read-only dataset!
My question is whether there is some way of loading a STATIC dataset into the workspace of the workers so as to prevent unnecessary communication overhead between workers? Is it possible to do this without having to load data from disk?
Here is the demo code:
matlabpool(2); % init 2 worker threads
Msize = 150; Nloop = 1000;
c1 = zeros(Msize, Nloop); c2 = zeros(Msize, Nloop);
% parallel initialization loop
MatSet = cell(Nloop, 1);
parfor i=1:Nloop
MatSet{i} = rand(Msize); % simulates expensive code operation
end
% real-time parallel loop (SLOW!)
tic;
parfor i=1:Nloop
c1(:,i) = MatSet{i} * rand(Msize, 1);
end
time1 = toc;
% real-time serial loop (for comparison)
tic;
for i=1:Nloop
c2(:,i) = MatSet{i} * rand(Msize, 1);
end
time2 = toc;
fprintf('Parallel time: %2.4f ms, Serial Time: %2.4f ms\n', 1000*time1,1000*time2);
matlabpool close;
Any comments are appreciated,
Coen

Accepted Answer

Edric Ellis
Edric Ellis on 17 Nov 2011
You might be able to take advantage of my Worker Object Wrapper which is designed to help set up this sort of static data to be used on workers.
  2 Comments
Coen de Visser
Coen de Visser on 18 Nov 2011
Hi Edric,
Very nice piece of code! Thanks! Using your WOW the execution of the demo code has become 5 times faster, but it still runs slower than the same code in an ordinary serial for loop.
This is how I now use the WOW:
w = WorkerObjWrapper(@generateMatrix, {Nloop, Msize} );
parfor i=1:Nloop
c3(:,i) = w.Value{i} * rand(Msize, 1);
end
with Nloop and Msize constants as defined in the demo code, and generateMatrix a very simple function which creates Nloop random matrices of size Msize, and returns them in a struct.
Is this syntax correct? If so, do you know what can be the cause of the remaining performance issue?
regards,
Coen
Edric Ellis
Edric Ellis on 21 Nov 2011
It's hard to say exactly why it's still slower. Your syntax is fine. Some points to note:
1. generateMatrix is evaluated once per worker.
2. The result "w.Value" is stored separately on each worker.
Either of those two factors could be important. Also, it's worth bearing in mind that some PARFOR loops do not experience speedup due to the overhead of going into a PARFOR loop, and the data transfer involved.

Sign in to comment.

More Answers (0)

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!