Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
122 views
in Technique[技术] by (71.8m points)

python - BeautifulSoup Object Will Not Pickle, Causes Interpreter to Silently Crash

I have a soup from BeautifulSoup that I cannot pickle. When I try to pickle the object the python interpreter silently crashes (such that it cannot be handled as an exception). I have to be able to pickle the object in order to return the object using the multiprocessing package (which pickles objects to pass them between processes). How can I troubleshoot/work around the problem? Unfortunately, I cannot post the html for the page (it is not publicly available), and I have been unable to find a reproducible example of the problem. I have tried to isolate the problem by looping over the soup and pickling individual components, the smallest thing that produces the error is <class 'BeautifulSoup.NavigableString'>. When I print the object it prints out u' '.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The class NavigableString is not serializable with pickle or cPickle, which multiprocessing uses. You should be able to serialize this class with dill, however. dill has a superset of the pickle interface, and can serialize most of python. multiprocessing will still fail, unless you use a fork of multiprocessing which uses dill, called pathos.multiprocessing.

Get the code here: https://github.com/uqfoundation.


For more information see: What can multiprocessing and dill do together?

http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

http://nbviewer.ipython.org/gist/minrk/5241793


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...