Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
253 views
in Technique[技术] by (71.8m points)

python - Cython: understanding a typed memoryview with a indirect_contignuous memory layout

I want to understand more about Cython's awesome typed-memoryviews and the memory layout indirect_contiguous.

According to the documentation indirect_contiguous is used when "the list of pointers is contiguous".

There's also an example usage:

# contiguous list of pointers to contiguous lists of ints
cdef int[::view.indirect_contiguous, ::1] b

So pls correct me if I'm wrong but I assume a "contiguous list of pointers to contiguous lists of ints" means something like the array created by the following c++ dummy-code:

// we want to create a 'contiguous list of pointers to contiguous lists of ints'

int** array;
// allocate row-pointers
// This is the 'contiguous list of pointers' related to the first dimension:
array = new int*[ROW_COUNT]

// allocate some rows, each row is a 'contiguous list of ints'
array[0] = new int[COL_COUNT]{1,2,3}

So if I understand correctly then in my Cython code it should be possible to get a memoryview from a int** like this:

cdef int** list_of_pointers = get_pointers()
cdef int[::view.indirect_contiguous, ::1] view = <int[:ROW_COUNT:view.indirect_contiguous,COL_COUNT:1]> list_of_pointers

But I get Compile-errors:

cdef int[::view.indirect_contiguous, ::1] view = <int[:ROW_COUNT:view.indirect_contiguous,:COL_COUNT:1]> list_of_pointers
                                                                                                        ^                                                                                                                              
------------------------------------------------------------

memview_test.pyx:76:116: Pointer base type does not match cython.array base type

what did I do wrong? Am I missing any casts or did I misunderstand the concept of indirect_contiguous?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Let's set the record straight: typed memory view can be only used with objects which implement buffer-protocol.

Raw C-pointers obviously don't implement the buffer-protocol. But you might ask, why something like the following quick&dirty code works:

%%cython    
from libc.stdlib cimport calloc
def f():
    cdef int* v=<int *>calloc(4, sizeof(int))
    cdef int[:] b = <int[:4]>v
    return b[0] # leaks memory, so what?

Here, a pointer (v) is used to construct a typed memory view (b). There is however more, going under the hood (as can be seen in the cythonized c-file):

  • a cython-array (i.e. cython.view.array) is constructed, which wraps the raw pointer and can expose it via buffer-protocol
  • this array is used for the creation of typed memory view.

Your understanding what view.indirect_contiguous is used for is right - it is exactly what you desire. However, the problem is view.array, which just cannot handle this type of data-layout.

view.indirect and view.indirect_contiguous correspond to PyBUF_INDIRECT in protocol-buffer parlance and for this the field suboffsets must contain some meaningful values (i.e >=0 for some dimensions). However, as can be see in the source-code view.array doesn't have this member at all - there is no way it can represent the complex memory layout at all!

Where does it leave us? As pointed out by @chrisb and @DavidW in your other question, you will have to implement a wrapper which can expose your data-structure via protocol-buffer.

There are data structures in Python, which use the indirect memory layout - most prominently the PIL-arrays. A good starting point to understand, how suboffsets are supposed to work is this piece of documenation:

void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides,
                       Py_ssize_t *suboffsets, Py_ssize_t *indices) {
    char *pointer = (char*)buf;    // A
    int i;
    for (i = 0; i < ndim; i++) {
        pointer += strides[i] * indices[i]; // B
        if (suboffsets[i] >=0 ) {
            pointer = *((char**)pointer) + suboffsets[i];  // C
        }
    }
    return (void*)pointer;  // D
}

In your case strides and offsets would be

  • strides=[sizeof(int*), sizeof(int)] (i.e. [8,4] on usual x86_64 machines)
  • offsets=[0,-1], i.e. only the first dimension is indirect.

Getting the address of element [x,y] would then happen as follows:

  • in the line A, pointer is set to buf, let's assume BUF.
  • first dimension:
    • in line B, pointer becomes BUF+x*8, and points to the location of the pointer to x-th row.
    • because suboffsets[0]>=0, we dereference the pointer in line C and thus it shows to address ROW_X - the start of the x-th row.
  • second dimension:
    • in line B we get the address of the y element using strides, i.e. pointer=ROW_X+4*y
    • second dimension is direct (signaled by suboffset[1]<0), so no dereferencing is needed.
  • we are done, pointer points to the desired address and is returned in line D.

FWIW, I have implemented a library which is able to export int** and similar memory layouts via buffer protocol: https://github.com/realead/indirect_buffer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...