Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
205 views
in Technique[技术] by (71.8m points)

python - What are the differences between a cpdef and a cdef wrapped in a def?

In the Cython docs there is an example where they give two ways of writing a C/Python hybrid method. An explicit one with a cdef for fast C access and a wrapper def for access from Python:

cdef class Rectangle:
    cdef int x0, y0
    cdef int x1, y1
    def __init__(self, int x0, int y0, int x1, int y1):
        self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
    cdef int _area(self):
        cdef int area
        area = (self.x1 - self.x0) * (self.y1 - self.y0)
        if area < 0:
            area = -area
        return area
    def area(self):
        return self._area()

And one using cpdef:

cdef class Rectangle:
    cdef int x0, y0
    cdef int x1, y1
    def __init__(self, int x0, int y0, int x1, int y1):
        self.x0 = x0; self.y0 = y0; self.x1 = x1; self.y1 = y1
    cpdef int area(self):
        cdef int area
        area = (self.x1 - self.x0) * (self.y1 - self.y0)
        if area < 0:
            area = -area
        return area

I was wondering what the differences are in practical terms.

For example, is either method faster/slower when called from C/Python?

Also, when subclassing/overriding does cpdef offer anything that the other method lacks?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

chrisb's answer gives you all you need to know, but if you are game for gory details...

But first, the takeaways from the lengthy analysis bellow in a nutshell:

  • For free functions, there is not much difference between cpdef and rolling it out with cdef+def performance-wise. The resulting c-code is almost identical.

  • For bound methods, cpdef-approach can be slightly faster in the presence of inheritance-hierarchies, but nothing to get too excited about.

  • Using cpdef-syntax has its advantages, as the resulting code is clearer (at least to me) and shorter.


Free functions:

When we define something silly like:

 cpdef do_nothing_cp():
   pass

the following happens:

  1. a fast c-function is created (in this case it has a cryptic name __pyx_f_3foo_do_nothing_cp because my extension is called foo, but you actually have only to look for the f prefix).
  2. a python-function is also created (called __pyx_pf_3foo_2do_nothing_cp - prefix pf), it does not duplicate the code and call the fast function somewhere on the way.
  3. a python-wrapper is created, called __pyx_pw_3foo_3do_nothing_cp (prefix pw)
  4. do_nothing_cp method definition is issued, this is what the python-wrapper is needed for, and this is the place where is stored which function should be called when foo.do_nothing_cp is invoked.

You can see it in the produced c-code here:

 static PyMethodDef __pyx_methods[] = {
  {"do_nothing_cp", (PyCFunction)__pyx_pw_3foo_3do_nothing_cp, METH_NOARGS, 0},
  {0, 0, 0, 0}
};

For a cdef function, only the first step happens, for a def-function only steps 2-4.

Now when we load module foo and invoke foo.do_nothing_cp() the following happens:

  1. The function pointer bound to name do_nothing_cp is found, in our case the python-wrapper pw-function.
  2. pw-function is called via function-pointer, and calls the pf-function (as C-functionality)
  3. pf-function calls the fast f-function.

What happens if we call do_nothing_cp inside the cython-module?

def call_do_nothing_cp():
    do_nothing_cp()

Clearly, cython doesn't need the python machinery to locate the function in this case - it can directly use the fast f-function via a c-function call, bypassing pw and pf functions.

What happens if we wrap cdef function in a def-function?

cdef _do_nothing():
   pass

def do_nothing():
  _do_nothing()

Cython does the following:

  1. a fast _do_nothing-function is created, corresponding to the f- function above.
  2. a pf-function for do_nothing is created, which calls _do_nothing somewhere on the way.
  3. a python-wrapper, i.e. pw function is created which wraps the pf-function
  4. the functionality is bound to foo.do_nothing via function-pointer to the python-wrapper pw-function.

As you can see - not much difference to the cpdef-approach.

The cdef-functions are just simple c-function, but def and cpdef function are python-function of the first class - you could do something like this:

foo.do_nothing=foo.do_nothing_cp

As to performance, we cannot expect much difference here:

>>> import foo
>>> %timeit foo.do_nothing_cp
51.6 ns ± 0.437 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

>>> %timeit foo.do_nothing
51.8 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

If we look at the resulting machine code (objdump -d foo.so), we can see that the C-compiler has inlined all calls for the cpdef-version do_nothing_cp:

 0000000000001340 <__pyx_pw_3foo_3do_nothing_cp>:
    1340:   48 8b 05 91 1c 20 00    mov    0x201c91(%rip),%rax      
    1347:   48 83 00 01             addq   $0x1,(%rax)
    134b:   c3                      retq   
    134c:   0f 1f 40 00             nopl   0x0(%rax)

but not for the rolled out do_nothing (I must confess, I'm a little bit surprised and don't understand the reasons yet):

0000000000001380 <__pyx_pw_3foo_1do_nothing>:
    1380:   53                      push   %rbx
    1381:   48 8b 1d 50 1c 20 00    mov    0x201c50(%rip),%rbx        # 202fd8 <_DYNAMIC+0x208>
    1388:   48 8b 13                mov    (%rbx),%rdx
    138b:   48 85 d2                test   %rdx,%rdx
    138e:   75 0d                   jne    139d <__pyx_pw_3foo_1do_nothing+0x1d>
    1390:   48 8b 43 08             mov    0x8(%rbx),%rax
    1394:   48 89 df                mov    %rbx,%rdi
    1397:   ff 50 30                callq  *0x30(%rax)
    139a:   48 8b 13                mov    (%rbx),%rdx
    139d:   48 83 c2 01             add    $0x1,%rdx
    13a1:   48 89 d8                mov    %rbx,%rax
    13a4:   48 89 13                mov    %rdx,(%rbx)
    13a7:   5b                      pop    %rbx
    13a8:   c3                      retq   
    13a9:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

This could explain, why cpdef version is slightly faster, but anyway the difference is nothing compared to the overhead of a python-function-call.


Class-methods:

The situation is a little bit more complicated for class methods, because of the possible polymorphism. Let's start out with:

cdef class A:
   cpdef do_nothing_cp(self):
       pass

At first sight, there is not that much difference to the case above:

  1. A fast, c-only, f-prefix-version of the function is emitted
  2. A python (prefix pf) version is emitted, which calls the f-function
  3. A python wrapper (prefix pw) wraps the pf-version and is used for registration.
  4. do_nothing_cp is registered as a method of class A via tp_methods-pointer of the PyTypeObject.

As can be seen in the produced c-file:

static PyMethodDef __pyx_methods_3foo_A[] = {
      {"do_nothing", (PyCFunction)__pyx_pw_3foo_1A_1do_nothing_cp, METH_NOARGS, 0},
      ...
      {0, 0, 0, 0}
    }; 
.... 
static PyTypeObject __pyx_type_3foo_A = {
 ...
  __pyx_methods_3foo_A, /*tp_methods*/
 ...
};

Clearly, the bound version has to have the implicit parameter self as an additional argument - but there is more to it: The f-function performs a function-dispatch if called not from the corresponding pf function, this dispatch looks as follows (I keep only the important parts):

static PyObject *__pyx_f_3foo_1A_do_nothing_cp(CYTHON_UNUSED struct __pyx_obj_3foo_A *__pyx_v_self, int __pyx_skip_dispatch) {

  if (unlikely(__pyx_skip_dispatch)) ;//__pyx_skip_dispatch=1 if called from pf-version
  /* Check if overridden in Python */
  else if (look-up if function is overriden in __dict__ of the object)
     use the overriden function
  }
  do the work.

Why is it needed? Consider the following extension foo:

cdef class A:
  cpdef do_nothing_cp(self):
   pass

cdef class B(A):
  cpdef call_do_nothing(self):
    self.do_nothing()

What happens when we call B().call_do_nothing()?

  1. `B-pw-call_do_nothing' is located and called.
  2. it calls B-pf-call_do_nothing,
  3. which calls B-f-call_do_nothing,
  4. which calls A-f-do_nothing_cp, bypassing pw and pf-versions.

What happens when we add the following class C, which overrides the do_nothing_cp-function?

import foo
def class C(foo.B):
    def do_nothing_cp(self):
        print("I do something!")

Now calling C().call_do_nothing() leads to:

  1. call_do_nothing' of theC-class being located and called which means,pw-call_do_nothing' of the B-class being located and called,
  2. which calls B-pf-call_do_nothing,
  3. which calls B-f-call_do_nothing,
  4. which calls A-f-do_nothing (as we already know!), bypassing pw and pf-versions.

And now in the 4. step, we need to dispatch the call in A-f-do_nothing() in order to get the right C.do_nothing() call! Luckily we have this dispatch in the function at hand!

To make it more complicated: what if the class C were also a cdef-class? The dispatch via __dict__ would not work, because cdef-classes don't have __dict__?

For the cdef-classes, the polymorphism is implemented similar to C++'s "virtual tables", so in B.call_do_nothing() the f-do_nothing-function is not called directly but via a pointer, which depends on the class of the object (one can see those "virtual tables" being set up in __pyx_pymod_exec_XXX, e.g. __pyx_vtable_3foo_B.__pyx_base). Thus the __dict__-dispatch in A-f-do_nothing()-function is not needed in case of pure cdef-hierarchy.


As to performance, comparing cpdef with cdef+def I get:

                          cpdef         def+cdef
 A.do_nothing              107ns         108ns 
 B.call_nothing            109ns         116ns

so the difference isn't that large with, if someone, cpdef being slightly faster.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...