Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
350 views
in Technique[技术] by (71.8m points)

python - expanding (adding a row or column) a scipy.sparse matrix

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.

There are 4 attributes that make up the csr_matrix:

data: An array containing the actual values in the matrix

indices: An array containing the column index corresponding to each value in data

indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.

shape: A tuple containing the shape of the matrix

If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...