Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
316 views
in Technique[技术] by (71.8m points)

Python: Concatenate many dicts of numpy arrays with same keys and size

I have a function called within a loop that returns a dict (dsst_mean) with roughly 50 variables. All variables are numpy arrays of length 10.

The loop iterates roughly 3000 times. I'm current concatenating towards the end of each loop so that I have an 'dsst_mean_all' dict that grows larger on each iteration.

source = [dsst_mean_all, dsst_mean]                
for key in source[0]:                    
    dsst_mean_all[key] = np.concatenate([d[key] for d in source])

It works, but I know this isn't efficient. I also have problems with initialization of the 'dsst_mean_all' dict. (I'm current using dict.fromkeys() to do this.)

My question is: what are some options to do this more efficiently? I'm thinking I could store the dsst_mean dicts in a list and then do one concatenate at the end. But I'm not sure if holding 3000+ dicts of numpy arrays in memory is a good idea. I know this depends on the size, but unfortunately right now I dont have an estimate of the size of each 'dsst_mean' dict.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Normally we recommend collecting values in a list, and making an array once, at the end. The new thing here is we need to iterate on the keys of a dictionary to do this collection.

For example:

A function to make the individual dictionaries:

In [804]: def foo(i):
     ...:     return {k:np.arange(5) for k in ['A','B','C']}
     ...: 
In [805]: foo(0)
Out[805]: 
{'A': array([0, 1, 2, 3, 4]),
 'B': array([0, 1, 2, 3, 4]),
 'C': array([0, 1, 2, 3, 4])}

A collector dictionary:

In [806]: dd = {k:[] for k in ['A','B','C']}

Iteration, collecting arrays in the lists:

In [807]: for _ in range(3):
     ...:     x = foo(None)
     ...:     for k,v in dd.items():
     ...:         v.append(x[k])
     ...:         
In [808]: dd
Out[808]: 
{'A': [array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])],
 'B': [array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])],
 'C': [array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])]}

Another iteration on the dictionary to turn the lists into some sort of array (stack, concatenate, your choice):

In [810]: for k,v in dd.items():
     ...:     dd[k]=np.stack(v,axis=0)
     ...:     
In [811]: dd
Out[811]: 
{'A': array([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]), 'B': array([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]]), 'C': array([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]])}

A list of 3000 arrays of length 10 will take up somewhat more memory than one array of 30,000 numbers, but not drastically more.

You could collect the whole dictionaries in one list the first time around, but you still need to combine those into on dictionary like this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...