Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
828 views
in Technique[技术] by (71.8m points)

performance - reuse JVM in Hadoop mapreduce jobs

I know we can set the property "mapred.job.reuse.jvm.num.tasks" to re-use JVM. My questions are:

(1) how to decide the number of tasks to be set here, -1 or some other positive integers?

(2) is it a good idea to already reuse JVMs and set this property to the value of -1 in mapreduce jobs?

Thank you very much!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you have very small tasks that are definitely running after each other, it is useful to set this property to -1 (meaning that a spawned JVM will be reused unlimited times). So you just spawn (number of task in your cluster available to your job)-JVMs instead of (number of tasks)-JVMs.

This is a huge performance improvement. In long running jobs the percentage of the runtime in comparision to setup a new JVM is very low, so it doesn't give you a huge performance boost.

Also in long running tasks it is good to recreate the task process, because of issues like heap fragmentation degrading your performance.

In addition, if you have some mid-time-running jobs, you could reuse just 2-3 of the tasks, having a good trade-off.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...