Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

parallel processing - Run multiple R-scripts simultaneously

In my thesis I need to perform a lot of simulation studies, which all takes quite a while. My computer has 4 cores, so I have been wondering if it is possible to run for example two R-scripts in Rstudio at the same time, by letting them use two different cores? If this could be done, I could be saving a lot of time by just leaving the computer over night running all these scripts.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In RStudio

If you right click on RStudio, you should be able to open several separate "sessions" of RStudio (whether or not you use Projects). By default these will use 1 core each.

Update (July 2018): RStudio v1.2.830-1 which is available as a Preview Release supports a "jobs" pane. This is dedicated to running R scripts in the background separate from the interactive R session:

  • Run any R script as a background job in a clean R session
  • Monitor progress and see script output in real time
  • Optionally give jobs your global environment when started, and export values back when complete

This will be available in RStudio version 1.2.

Running Scripts in the Terminal

If have several scripts that you know run without errors, I'd recommend running these on different parameters through the command-line:

RCMD script.R
RScript script.R
R --vanilla < script.R

Running in the background:

nohup Rscript script.R &

Here "&" runs the script in the background (it can be retrieved with fg, monitored with htop, and killed with kill <pid> or pkill rsession) and nohup saves the output in a file and continues to run if the terminal is closed.

Passing arguments to a script:

Rscript script.R 1 2 3

This will pass c(1, 2, 3) to R as the output of commandArgs() so a loop in bash can run multiple instances of Rscript with a bash loop:

for ii in 1 2 3
  do
  nohup Rscript script.R $ii &
  done

Running parallel code within R

You will often find that a particular step in your R script is slowing computations, may I suggest running parallel code within your R code rather than running them separately? I'd recommend the [snow package][1] for running loops in parallel in R. Generally, instead of use:

cl <- makeCluster(n)
# n = number of cores (I'd recommend one less than machine capacity)
clusterExport(cl, list=ls()) #export input data to all cores
output_list <- parLapply(cl, input_list, function(x) ... )
stopCluster() # close cluster when complete (particularly on shared machines)

Use this anywhere you would normally use a lapply function in R to run it in parallel. [1]: https://www.r-bloggers.com/quick-guide-to-parallel-r-with-snow/


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...