November | 2013 | Frank Cleary

require(parallel) require(doParallel) library(foreach) library(iterators) parFun <- function(A, B){ A%*%B } nCores <- 2 n <- 4000 # for this toy code, make sure n is divisible by nCores a <- rnorm(n*n) A <- matrix(a, ncol=n) registerDoParallel(nCores) systime <- system.time( result <- foreach(i = 1:nCores, .combine = cbind) %dopar% { rows <- ((i - 1)*n/nCores + 1):(i*n/nCores) out <- parFun(A, A[, rows]) } ) print(paste("Cores = ", nCores)) print(systime)

I’m all about efficiency, and I’ve been excited to learn more about parallel processing. I really like seeing my cpu using all four cores at 100%. I paid for all those cores, and I want them computing!

If I have some calculations running in python it’s mildly annoying to check on task manager (in windows – ctrl-shift-esc, click on the “performance” tab), and see my cpu usage at 25%. Only one core of the four cores are being used. The reasons for this are somewhat complex, but this is python so there are a number of modules that make it easy to parallelize operations (post on similar code in R).

Do I have one hard working core, and three lazy ones?

To start learning how to parallelize calculations in python I used parallel python. To test the speed up gained by using parallel processing I wrote an inefficient matrix multiplication function using for loops (code below). Using numpy’s matrix class would be much better here, but numpy operations can do some parallelization on their own. Generating four 400 x 400 matrices and multiplying them by themselves took 76.2 s when run serially, and 21.8 s when distributed across the four cores, almost matching a full four-fold speedup. Also, I see I’m getting the full value from the four cores I paid for:

Using all the cpus to speed up the calculation.

This trivial sort of parallelization is ideal for speeding up simulations, since each simulation is independent of the others.

import pp
import time
import random

def randprod(n):
    """Waste time by inefficiently multiplying an n x n matrix"""
    random.seed(0)
    n = int(n)
    # Generate an n x n matrix of random numbers
    X = [[random.random() for _ in range(n)] for _ in range(n)]
    Y = [[None]*n]*n
    for i in range(n):
        for j in range(n):
            Y[i][j] = sum([X[i][k]*X[k][j] for k in range(n)])
    return Y

if __name__ == "__main__":
    test_n = [400]*4
    # Do 4 serial calculations:
    start_time = time.time()
    series_sums = [randprod(n) for n in test_n]
    elapsed = (time.time() - start_time)
    print "Time for serial processing: %.1f" % elapsed + " s"

    time.sleep(10) # provide a break in windows CPU usage graph    

    # Do the same 4 calculations in parallel, using parallel python
    par_start_time = time.time()
    n_cpus = 4
    job_server = pp.Server(n_cpus)
    jobs = [job_server.submit(randprod, (n,), (), ("random",)) for n in test_n]
    par_sums = [job() for job in jobs]
    par_elapsed = (time.time() - par_start_time)
    print "Time for parallel processing: %.1f" % par_elapsed + " s"

    print par_sums == series_sums

Frank Cleary

Monthly Archives: November 2013

Parallel processing in R

Parallel processing in python