Official document TensorFlow SavedModel Warmup says:
The TensorFlow runtime has components that are lazily initialized, which can cause high latency for the first request/s sent to a model after it is loaded. This latency can be several orders of magnitude higher than that of a single inference request.
In my opinion, since a prediction process could warmups a model, components that are lazily initialized couldn't be the init_op of the graph, because init_op only depends on parameters saved in SavedModel, and TFS will call the restore_op to do the initializations.
components that are lazily initialized
init_op
restore_op
If I'm right with this, then what is the components that are lazily initialized ?
1.4m articles
1.4m replys
5 comments
57.0k users