Parallel Computing Toolbox

Calculate Relative Area of US States

This example computes the relative areas of US states using a point-in-polygon method. The performance of a CPU vs. a Graphics Processing Unit (GPU) for this task is compared.


Check for the presence of GPU

Look for a supported GPU and put out an information message.

    device = gpuDevice;
    disp(['Running a ', device.Name, ' on ' computer('arch') ]);
    useGPU = true;
catch e
    disp 'No supported GPU found; for test purpose only'
    useGPU = false;
Running a GeForce GTX 285 on maci64

Load State Outlines

Use the Mapping Toolbox to load the outlines of US states. If the Mapping Toolbox is not available, the outlines are loaded from a .MAT file instead.

    [xFaces,yFaces,x1,x2,y1,y2,name] = tools.states2poly('usastatelo.shp');
catch e % Mapping Toolbox not available

nFaces = numel(name);
nodes  = zeros(nFaces,1);  % node count per state

Bounding Box and Grid

Create a cloud of points within the bounding box surrounding all of the states. There are several ways to construct this.

  1. Construct a simple monotonically spaced rectangular grid.
  2. Use a quasi-random number generator to create a random set of points with good spatial distribution.

Use the Halton Quasi-random point set. Quasi-random point sets are a feature of the Statistics Toolbox. If this toolbox is unavailable, a rectangular grid will be used instead.

numberOfPoints = 1e4;

    h = haltonset(2);
    h = scramble(h,'RR2');
    q = qrandstream(h);
    z = rand(q,numberOfPoints,2);

    xGrid = x1 + z(:,1) * (x2 - x1);
    yGrid = y1 + z(:,2) * (y2 - y1);
catch e % Stats toolbox not available, use a simple rectangular grid
    xvec = linspace(x1,x2,numberOfPoints);
    yvec = linspace(x1,x2,numberOfPoints);
    [xGrid,yGrid] = meshgrid(xvec,yvec);

Show Random Points selected

Thousands of points have now been created. Display a portion of the US map (in this case, the San Francisco Bay area of California) with the points overlaid. The X and Y axis are degrees of longitude and latitude.

hold on
for i = 1:numel(xFaces)

xlim ([-124,-120]);
title('Randomly selected points in the Bay Area');
hold off

Loop through each polygon face

Calculate the number of points in the point set that lie within the boundaries of each state. The relative area is computed by dividing the number of nodes within each state by the total number of nodes within all states.

tCPUTotal = 0;
tGPUTotal = 0;
tCPUCompute = 0;
tGPUCompute = 0;

for k = 1:nFaces

    xv = xFaces{k};
    yv = yFaces{k};

    th = tic;
    [in,t] = inpoly(xGrid,yGrid,xv,yv,false);  % CPU
    tCPUTotal   = tCPUTotal + toc(th);
    tCPUCompute = tCPUCompute + t;

    th = tic;
    [in,t] = inpoly(xGrid,yGrid,xv,yv,useGPU);   % GPU KEA
    tGPUTotal   = tGPUTotal + toc(th);
    tGPUCompute = tGPUCompute + t;

    nodes(k) = nnz(in);

    fprintf('Face %d processed\n',k);


area = nodes ./ sum(nodes);

disp(['Total CPU Time: ' num2str(tCPUTotal)]);
disp(['Total GPU Time: ' num2str(tGPUTotal)]);
disp(['CPU Time Excluding Vectorization Overhead: ' num2str(tCPUCompute)]);
disp(['GPU Time Excluding Vectorization Overhead: ' num2str(tGPUCompute)]);
Face 1 processed
Face 2 processed
Face 3 processed
Face 4 processed
Face 5 processed
Face 6 processed
Face 7 processed
Face 8 processed
Face 9 processed
Face 10 processed
Face 11 processed
Face 12 processed
Face 13 processed
Face 14 processed
Face 15 processed
Face 16 processed
Face 17 processed
Face 18 processed
Face 19 processed
Face 20 processed
Face 21 processed
Face 22 processed
Face 23 processed
Face 24 processed
Face 25 processed
Face 26 processed
Face 27 processed
Face 28 processed
Face 29 processed
Face 30 processed
Face 31 processed
Face 32 processed
Face 33 processed
Face 34 processed
Face 35 processed
Face 36 processed
Face 37 processed
Face 38 processed
Face 39 processed
Face 40 processed
Face 41 processed
Face 42 processed
Face 43 processed
Face 44 processed
Face 45 processed
Face 46 processed
Face 47 processed
Face 48 processed
Face 49 processed
Total CPU Time: 31.656
Total GPU Time: 7.9684
CPU Time Excluding Vectorization Overhead: 26.0426
GPU Time Excluding Vectorization Overhead: 4.3255

Plot States

The states are shaded by relative area. Since points were picked at random and evenly distributed, we expect larger states to have captured a greater number of points. The color of Texas bears this out.


Finally, display a chart of speed-ups seen on various CPUs and GPUs

The speed of a variety of CPU and GPU runs of this application were collected in a .MAT file. We load the results and display a relative speed-up chart, using a quad-core CPU as the baseline for a speed-up of "1".

Note the dip in speed-up at 32K. Below 32K, a one-kernel implementation (seen in this example) was used. At 32K and above, a two-kernel implementation was necessary to fit the data set within available GPU memory. As the problem size continues to increase, the overhead of a second kernel is quickly made up, as can be seen in the 64K performance numbers.

load('benchmarks.mat', 'results');

fig = figure;
ax = axes('parent', fig);
plot(ax, results.sizes, results.sCPU1, '-+', ...
    results.sizes, results.sCPU4, '-o', ...
    results.sizes, results.sC1060, '-v', ...
    results.sizes, results.sC2050, '-*');

legend(results.legendLabels, 'Location', 'NorthWest');
title(ax, 'Speedup of elementwise computations on GPU compared to CPU');
ylabel(ax, 'Speedup');
xlabel(ax, 'Problem size');
set(ax, 'YGrid', 'on');