Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
473 views
in Technique[技术] by (71.8m points)

linux - Without root access, run R with tuned BLAS when it is linked with reference BLAS

Can any one tell me why I can not successfully test OpenBLAS's dgemm performance (in GFLOPs) in R via the following way?

  1. link R with the "reference BLAS" libblas.so
  2. compile my C program mmperf.c with OpenBLAS library libopenblas.so
  3. load the resulting shared library mmperf.so into R, call the R wrapper function mmperf and report dgemm performance in GFLOPs.

Point 1 looks strange, but I have no choice because I have no root access on machines I want to test, so actual linking to OpenBLAS is impossible. By "not successfully" I mean my program ends up reporting dgemm performance for reference BLAS instead of OpenBLAS. I hope someone can explain to me:

  1. why my way does not work;
  2. is it possible at all to make it work (this is important, because if it is impossible, I must write a C main function and do my job in a C program.)

I've investigated into this issue for two days, here I will include various system output to assist you to make a diagnose. To make things reproducible, I will also include the code, makefile as well as shell command.

Part 1: system environment before testing

There are 2 ways to invoke R, either using R or Rscript. There are some differences in what is loaded when they are invoked:

~/Desktop/dgemm$ readelf -d $(R RHOME)/bin/exec/R | grep "NEEDED"
0x00000001 (NEEDED)         Shared library: [libR.so]
0x00000001 (NEEDED)         Shared library: [libpthread.so.0]
0x00000001 (NEEDED)         Shared library: [libc.so.6]

~/Desktop/dgemm$ readelf -d $(R RHOME)/bin/Rscript | grep "NEEDED"
0x00000001 (NEEDED)         Shared library: [libc.so.6]

Here we need to choose Rscript, because R loads libR.so, which will automatically load the reference BLAS libblas.so.3:

~/Desktop/dgemm$ readelf -d $(R RHOME)/lib/libR.so | grep blas
0x00000001 (NEEDED)         Shared library: [libblas.so.3]

~/Desktop/dgemm$ ls -l /etc/alternatives/libblas.so.3
... 31 May /etc/alternatives/libblas.so.3 -> /usr/lib/libblas/libblas.so.3.0

~/Desktop/dgemm$ readelf -d /usr/lib/libblas/libblas.so.3 | grep SONAME
0x0000000e (SONAME)         Library soname: [libblas.so.3]

Comparatively, Rscript gives a cleaner environment.

Part 2: OpenBLAS

After downloading source file from OpenBLAS and a simple make command, a shared library of the form libopenblas-<arch>-<release>.so-<version> can be generated. Note that we will not have root access to install it; instead, we copy this library into our working directory ~/Desktop/dgemm and rename it simply to libopenblas.so. At the same time we have to make another copy with name libopenblas.so.0, as this is the SONAME which run time loader will seek for:

~/Desktop/dgemm$ readelf -d libopenblas.so | grep "RPATH|SONAME"
0x0000000e (SONAME)         Library soname: [libopenblas.so.0]

Note that the RPATH attribute is not given, which means this library is intended to be put in /usr/lib and we should call ldconfig to add it to ld.so.cache. But again we don't have root access to do this. In fact, if this can be done, then all the difficulties are gone. We could then use update-alternatives --config libblas.so.3 to effectively link R to OpenBLAS.

Part 3: C code, Makefile, and R code

Here is a C script mmperf.c computing GFLOPs of multiplying 2 square matrices of size N:

#include <R.h>
#include <Rmath.h>
#include <Rinternals.h>
#include <R_ext/BLAS.h>
#include <sys/time.h>

/* standard C subroutine */
double mmperf (int n) {
  /* local vars */
  int n2 = n * n, tmp; double *A, *C, one = 1.0;
  struct timeval t1, t2; double elapsedTime, GFLOPs;
  /* simulate N-by-N matrix A */
  A = (double *)calloc(n2, sizeof(double));
  GetRNGstate();
  tmp = 0; while (tmp < n2) {A[tmp] = runif(0.0, 1.0); tmp++;}
  PutRNGstate();
  /* generate N-by-N zero matrix C */
  C = (double *)calloc(n2, sizeof(double));
  /* time 'dgemm.f' for C <- A * A + C */
  gettimeofday(&t1, NULL);
  F77_CALL(dgemm) ("N", "N", &n, &n, &n, &one, A, &n, A, &n, &one, C, &n);
  gettimeofday(&t2, NULL);
  /* free memory */
  free(A); free(C);
  /* compute and return elapsedTime in microseconds (usec or 1e-6 sec) */
  elapsedTime = (double)(t2.tv_sec - t1.tv_sec) * 1e+6;
  elapsedTime += (double)(t2.tv_usec - t1.tv_usec);
  /* convert microseconds to nanoseconds (1e-9 sec) */
  elapsedTime *= 1e+3;
  /* compute and return GFLOPs */
  GFLOPs = 2.0 * (double)n2 * (double)n / elapsedTime;
  return GFLOPs;
  }

/* R wrapper */
SEXP R_mmperf (SEXP n) {
  double GFLOPs = mmperf(asInteger(n));
  return ScalarReal(GFLOPs);
  }

Here is a simple R script mmperf.R to report GFLOPs for case N = 2000

mmperf <- function (n) {
  dyn.load("mmperf.so")
  GFLOPs <- .Call("R_mmperf", n)
  dyn.unload("mmperf.so")
  return(GFLOPs)
  }

GFLOPs <- round(mmperf(2000), 2)
cat(paste("GFLOPs =",GFLOPs, "
"))

Finally there is a simple makefile to generate the shared library mmperf.so:

mmperf.so: mmperf.o
    gcc -shared -L$(shell pwd) -Wl,-rpath=$(shell pwd) -o mmperf.so mmperf.o -lopenblas

mmperf.o: mmperf.c
    gcc -fpic -O2 -I$(shell Rscript --default-packages=base --vanilla -e 'cat(R.home("include"))') -c mmperf.c

Put all these files under working directory ~/Desktop/dgemm, and compile it:

~/Desktop/dgemm$ make
~/Desktop/dgemm$ readelf -d mmperf.so | grep "NEEDED|RPATH|SONAME"
0x00000001 (NEEDED)            Shared library: [libopenblas.so.0]
0x00000001 (NEEDED)            Shared library: [libc.so.6]
0x0000000f (RPATH)             Library rpath: [/home/zheyuan/Desktop/dgemm]

The output reassures us that OpenBLAS is correctly linked, and the run time load path is correctly set.

Part 4: testing OpenBLAS in R

Let's do

~/Desktop/dgemm$ Rscript --default-packages=base --vanilla mmperf.R

Note our script needs only the base package in R, and --vanilla is used to ignore all user settings on R start-up. On my laptop, my program returns:

GFLOPs = 1.11

Oops! This is truely reference BLAS performance not OpenBLAS (which is about 8-9 GFLOPs).

Part 5: Why?

To be honest, I don't know why this happens. Each step seems to work correctly. Does something subtle occurs when R is invoked? For example, any possibility that OpenBLAS library is overridden by reference BLAS at some point for some reason? Any explanations and solutions? Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

why my way does not work

First, shared libraries on UNIX are designed to mimic the way archive libraries work (archive libraries were there first). In particular that means that if you have libfoo.so and libbar.so, both defining symbol foo, then whichever library is loaded first is the one that wins: all references to foo from anywhere within the program (including from libbar.so) will bind to libfoo.sos definition of foo.

This mimics what would happen if you linked your program against libfoo.a and libbar.a, where both archive libraries defined the same symbol foo. More info on archive linking here.

It should be clear from above, that if libblas.so.3 and libopenblas.so.0 define the same set of symbols (which they do), and if libblas.so.3 is loaded into the process first, then routines from libopenblas.so.0 will never be called.

Second, you've correctly decided that since R directly links against libR.so, and since libR.so directly links against libblas.so.3, it is guaranteed that libopenblas.so.0 will lose the battle.

However, you erroneously decided that Rscript is better, but it's not: Rscript is a tiny binary (11K on my system; compare to 2.4MB for libR.so), and approximately all it does is exec of R. This is trivial to see in strace output:

strace -e trace=execve /usr/bin/Rscript --default-packages=base --vanilla /dev/null
execve("/usr/bin/Rscript", ["/usr/bin/Rscript", "--default-packages=base", "--vanilla", "/dev/null"], [/* 42 vars */]) = 0
execve("/usr/lib/R/bin/R", ["/usr/lib/R/bin/R", "--slave", "--no-restore", "--vanilla", "--file=/dev/null", "--args"], [/* 43 vars */]) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=89625, si_status=0, si_utime=0, si_stime=0} ---
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=89626, si_status=0, si_utime=0, si_stime=0} ---
execve("/usr/lib/R/bin/exec/R", ["/usr/lib/R/bin/exec/R", "--slave", "--no-restore", "--vanilla", "--file=/dev/null", "--args"], [/* 51 vars */]) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=89630, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++

Which means that by the time your script starts executing, libblas.so.3 has been loaded, and libopenblas.so.0 that will be loaded as a dependency of mmperf.so will not actually be used for anything.

is it possible at all to make it work

Probably. I can think of two possible solutions:

  1. Pretend that libopenblas.so.0 is actually libblas.so.3
  2. Rebuild entire R package against libopenblas.so.

For #1, you need to ln -s libopenblas.so.0 libblas.so.3, then make sure that your copy of libblas.so.3 is found before the system one, by setting LD_LIBRARY_PATH appropriately.

This appears to work for me:

mkdir /tmp/libblas
# pretend that libc.so.6 is really libblas.so.3
cp /lib/x86_64-linux-gnu/libc.so.6 /tmp/libblas/libblas.so.3
LD_LIBRARY_PATH=/tmp/libblas /usr/bin/Rscript /dev/null
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared object '/usr/lib/R/library/stats/libs/stats.so':
  /usr/lib/liblapack.so.3: undefined symbol: cgemv_
During startup - Warning message:
package ‘stats’ in options("defaultPackages") was not found

Note how I got an error (my "pretend" libblas.so.3 doesn't define symbols expected of it, since it's really a copy of libc.so.6).

You can also confirm which version of libblas.so.3 is getting loaded this way:

LD_DEBUG=libs LD_LIBRARY_PATH=/tmp/libblas /usr/bin/Rscript /dev/null |& grep 'libblas.so.3'
     91533: find library=libblas.so.3 [0]; searching
     91533:   trying file=/usr/lib/R/lib/libblas.so.3
     91533:   trying file=/usr/lib/x86_64-linux-gnu/libblas.so.3
     91533:   trying file=/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libblas.so.3
     91533:   trying file=/tmp/libblas/libblas.so.3
     91533: calling init: /tmp/libblas/libblas.so.3

For #2, you said:

I have no root access on machines I want to test, so actual linking to OpenBLAS is impossible.

but that seems to be a bogus argument: if you can build libopenblas, surely you can also build your own version of R.

Update:

You mentioned in the beginning that libblas.so.3 and libopenblas.so.0 define the same symbol, what does this mean? They have different SONAME, is that insufficient to distinguish them by the system?

The symbols and the SONAME have nothing to do with each other.

You can see symbols in the output from readelf -Ws libblas.so.3 and readelf -Ws libopenblas.so.0. Symbols related to BLAS, such as cgemv_, will appear in both libraries.

Your confusion about SONAME possibly comes from Windows. The DLLs on Windows are designed completely differently. In particular, when FOO.DLL imports symbol bar from BAR.DLL, both the name of the symbol (bar) and the DLL from which that symbol was imported (BAR.DLL) are recorded in the FOO.DLLs import table.

That makes it easy to have R import cgemv_ from BLAS.DLL, while MMPERF.DLL imports the same symbol from OPENBLAS.DLL.

However, that makes library interpositioning hard, and works completely differently from the way archive libraries work (even on Windows).

Opinions differ on which design is better overall, but neither system is likely to ever change its model.

There are ways for UNIX to emulate Windows-style symbol binding: see RTLD_DEEPBIND in dlopen man page. Beware: these are fraught with peril, likely to confuse UNIX experts, are not widely used, and likely to have implementation bugs.

Update 2:

you mean I compile R and install it under my home directory?

Yes.

Then when I want to invoke it, I should explicitly give the path to my version of executable program, otherwise the one on the system might be invoked instead? Or, can I put this path at the first position of environment variable $PATH to cheat the system?

Either way works.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...