Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.3k views
in Technique[技术] by (71.8m points)

machine learning - How to save large sklearn RandomForestRegressor model for inference

I trained a Sklearn RandomForestRegressor model on 19GB of training data. I would like to save it to disk in order to use it later for inference. As have been recomended in another stackoverflow questions, I tried the following:

  • Pickle
pickle.dump(model, open(filename, 'wb'))

Model was saved successfully. It's size on disk was 1.9 GB.

loaded_model = pickle.load(open(filename, 'rb'))

Loading of the model resulted in MemorError (despite 16 GB RAM)

  • cPickle - the same result as Pickle
  • Joblib

joblib.dump(est, 'random_forest.joblib' compress=3)

It also ends with the MemoryError while loading the file.

  • Klepto
d = klepto.archives.dir_archive('sklearn_models', cached=True, serialized=True)
d['sklearn_random_forest'] = est
d.dump()

Arhcive is created, but when I want to load it using the following code, I get the KeyError: 'sklearn_random_forest'

d = klepto.archives.dir_archive('sklearn_models', cached=True, serialized=True)
d.load(model_params)
est = d[model_params]

I tried saving dictionary object using the same code, and it worked, so the code is correct. Apparently Klepto cannot persist sklearn models. I played with cached and serialized parameters and it didn't help.

Any hints on how to handle this would be very appreciated. Is it possible to save the model in JSON, XML, maybe HDFS, or maybe other formats?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try using joblib.dump()

In this method, you can use the param "compress". This param takes in Integer values between 0 and 9, the higher the value the more compressed your file gets. Ideally, a compress value of 3 would suffice.

The only downside is that the higher the compress value slower the write/read speed!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...