Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
635 views
in Technique[技术] by (71.8m points)

parallel processing - Understanding the differences between mclapply and parLapply in R

I've recently started using parallel techniques in R for a project and have my program working on Linux systems using mclapply from the parallel package. However, I've hit a road block with my understanding of parLapply for Windows.

Using mclapply I can set the number of cores, iterations, and pass that to an existing function in my workspace.

mclapply(1:8, function(z) adder(z, 100), mc.cores=4)

I don't seem to be able to achieve the same in windows using parLapply. As I understand it, I need to pass all the variables through using clusterExport() and pass the actual function I want to apply into the argument.

Is this correct or is there something similar to the mclapply function that's applicable to Windows?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The beauty of mclapply is that the worker processes are all created as clones of the master right at the point that mclapply is called, so you don't have to worry about reproducing your environment on each of the cluster workers. Unfortunately, that isn't possible on Windows.

When using parLapply, you generally have to perform the following additional steps:

  • Create a PSOCK cluster
  • Register the cluster if desired
  • Load necessary packages on the cluster workers
  • Export necessary data and functions to the global environment of the cluster workers

Also, when you're done, it's good practice to shutdown the PSOCK cluster using stopCluster.

Here's a translation of your example to parLapply:

library(parallel)
cl <- makePSOCKcluster(4)
setDefaultCluster(cl)
adder <- function(a, b) a + b
clusterExport(NULL, c('adder'))
parLapply(NULL, 1:8, function(z) adder(z, 100))

If your adder function requires a package, you'll have to load that package on each of the workers before calling it with parLapply. You can do that quite easily with clusterEvalQ:

clusterEvalQ(NULL, library(MASS))

Note that the NULL first argument to clusterExport, clusterEval and parLapply indicates that they should use the cluster object registered via setDefaultCluster. That can be very useful if your program is using mclapply in many different functions, so that you don't have to pass the cluster object to every function that needs it when converting your program to use parLapply.

Of course, adder may call other functions in your global environment which call other functions, etc. In that case, you'll have to export them as well and load any packages that they need. Also note that if any variables that you've exported change during the course of your program, you will have to export them again in order to update them on the cluster workers. Again, that isn't necessary with mclapply because it always creates/clones/forks the workers whenever it is called, making that unnecessary.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...