Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
844 views
in Technique[技术] by (71.8m points)

node.js - Way to lower memory usage by mongoose when doing query

I am working on node backend trying to optimize a very heavy query to mongodb via mongoose. The expected return size is considerable, but for some reason when I make the request, node begins consuming huge amounts of memory, like, 200mb+ for a single big request.

Considering the size of the return is less than 10mb in most cases, this doesn't seem right. It also refuses to let go of memory after it has finished, I know this is probably just the V8 GC doing its default behavior, but what concerns me is the huge amount of memory being consumed for a single find() request.

I've isolated it down through testing to the find() call. Once done with the call, it performs some post processing then sends the data to a callback, all in an anonymous function. I have tried using the querystream instead of the model.find(), but it shows no real improvements.

Looking around has not yielded any responses, so I will ask, is there a known way to reduce, control, or optimize the memory usage by mongoose? Does anyone know why so much excess memory is being used for a single call?

EDIT

As per Johnny and Blakes suggestions, using mixture of lean() with streaming, and using pause and resume have improved the runtime and memory usage immensely. Thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Default mongoose .find() of course returns all results as an "array", so that is always going to use memory with large results, so this leaves the "stream" interface.

The basic problem here is that you are using a stream interface ( as this inherits from the basic node stream ) each data event "fires" and the associated event handler is executed continuosly.

This means that even with a "stream" your subsequent actions in the event handler are "stacking" up, at the very least consuming lots of memory and possibly eating up the call stack if there are further asynchronous processes being fired in there.

So the best thing you can do is start to "limit" the actions in your stream processing. This is as simple as calling the .pause() method:

var stream = model.find().stream();   // however you call

stream.on("data",function() {
    // call pause on entry
    stream.pause();

    // do processing
    stream.resume();            // then resume when done
});

So .pause() stops the events in the stream being emitted and this allows the actions in your event handler to complete before continuing so that they are not all coming at once.

When your handling code is complete, you call .resume(), either directly within the block as shown here of within the callback block of any asynchronous action performed within the block. Note that the same rules apply for async actions, and that "all" must signal completion before you should call resume.

There are other optimizations that can be applied as well, and you might do well to look a "queue processing" or "async flow control" available modules to aid you in getting more performance with some parallel execution of this.

But basically think .pause() then process and .resume() to continue to avoid eating up lots of memory in your processing.

Also, be aware of your "outputs" and similarly try to use a "stream" again if building up something for a response. All this will be for nothing if the work you are doing is just actually building up another variable in memory, so it helps to be aware of that.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...