Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
401 views
in Technique[技术] by (71.8m points)

python - Relationship between pickle and deepcopy

What exactly is the relationship between pickle and copy.deepcopy? What mechanisms do they share, and how?

It is clear the two are closely-related operations, and share some of the mechanisms/protocols, but I can't wrap my head around the details.

Some (confusing) things I found out:

  1. If a class defines __[gs]etstate__, they get called upon a deepcopy of its instances. This surprised me at first, because I thought they are specific to pickle, but then I found that Classes can use the same interfaces to control copying that they use to control pickling. However, there's no documentation of how __[gs]etstate__ is used when deepcopying (how the value returned from __getstate__ is used, what is being passed to __setstate__?)
  2. A naive alternative implementation of deepcopy would be pickle.loads(pickle.dumps(obj)). However, this can't possibly be equivalent to deepcopy'ing, because if a class defines a __deepcopy__ operation, it would not be invoked using this pickle-based implementation of deepcopy. (I also stumbled upon a statement that deepcopy is more general than pickle, and there are many types which are deepcopyable, but not pickleable.)

(1) indicates a commonality, while (2) indicates a difference between pickle and deepcopy.

On top of that, I found these two contradictory statements:

copy_reg: The pickle, cPickle, and copy modules use those functions when pickling/copying those objects

and

The copy module does not use the copy_reg registration module

This, on one hand, is another indication of a relationship/commonality between pickle and deepcopy, and on the other hand, contributes to the my confusion...

[My experience is with python2.7, but I'd also appreciate any pointers regarding the differences in pickle/deepcopy between python2 and python3]

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You should not be confused by (1) and (2). In general, Python tries to include sensible fall-backs for missing methods. (For instance, it is enough to define __getitem__ in order to have an iterable class, but it may be more efficient to also implement __iter__. Similar for operations like __add__, with optional __iadd__ etc.)

__deepcopy__ is the most specialized method that deepcopy() will look for, but if it does not exists, falling back to the pickle protocol is a sensible thing to do. It does not really call dumps()/loads(), because it does not rely on the intermediate representation to be a string, but it will indirectly make use of __getstate__ and __setstate__ (via __reduce__), as you observed.

Currently, the documentation still states

… The copy module does not use the copy_reg registration module.

but that seems to be a bug that has been fixed in the meantime (possibly, the 2.7 branch has not gotten enough attention here).

Also note that this is pretty deeply integrated into Python (at least nowadays); the object class itself implements __reduce__ (and its versioned _ex variant), which refers to copy_reg.__newobj__ for creating fresh instances of the given object-derived class.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...