java - If profiler is not the answer, what other choices do we have?

Question

Welcome To Ask or Share your Answers For Others

java - If profiler is not the answer, what other choices do we have?

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

java - If profiler is not the answer, what other choices do we have?

After watching the presentation "Performance Anxiety" of Joshua Bloch, I read the paper he suggested in the presentation "Evaluating the Accuracy of Java Pro?lers". Quoting the conclusion:

Our results are disturbing because they indicate that pro?ler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and signi?cant—all four of the state-of-the-art pro?lers produce incorrect pro?les. Incorrect pro?les can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance. We show that a proof-of-concept pro?ler that does not use yield points for sampling does not suffer from the above problems

The conclusion of the paper is that we cannot really believe the result of profilers. But then, what is the alternative of using profilers. Should we go back and just use our feeling to do optimization?

UPDATE: A point that seems to be missed in the discussion is observer effect. Can we build a profiler that really 'observer effect'-free?

question from:https://stackoverflow.com/questions/4387895/if-profiler-is-not-the-answer-what-other-choices-do-we-have

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:58:59+0000

Oh, man, where to begin?

First, I'm amazed that this is news. Second, the problem is not that profilers are bad, it is that some profilers are bad. The authors built one that, they feel, is good, just by avoiding some of the mistakes they found in the ones they evaluated. Mistakes are common because of some persistent myths about performance profiling.

But let's be positive. If one wants to find opportunities for speedup, it is really very simple:

Sampling should be uncorrelated with the state of the program.
That means happening at a truly random time, regardless of whether the program is in I/O (except for user input), or in GC, or in a tight CPU loop, or whatever.
Sampling should read the function call stack,
so as to determine which statements were "active" at the time of the sample. The reason is that every call site (point at which a function is called) has a percentage cost equal to the fraction of time it is on the stack. (Note: the paper is concerned entirely with self-time, ignoring the massive impact of avoidable function calls in large software. In fact, the reason behind the original gprof was to help find those calls.)
Reporting should show percent by line (not by function).
If a "hot" function is identified, one still has to hunt inside it for the "hot" lines of code accounting for the time. That information is in the samples! Why hide it?

An almost universal mistake (that the paper shares) is to be concerned too much with accuracy of measurement, and not enough with accuracy of location. For example, here is an example of performance tuning in which a series of performance problems were identified and fixed, resulting in a compounded speedup of 43 times. It was not essential to know precisely the size of each problem before fixing it, but to know its location. A phenomenon of performance tuning is that fixing one problem, by reducing the time, magnifies the percentages of remaining problems, so they are easier to find. As long as any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems. It is not essential to fix them in decreasing size order, but it is essential to pinpoint them.

On the subject of statistical accuracy of measurement, if a call point is on the stack some percent of time F (like 20%), and N (like 100) random-time samples are taken, then the number of samples that show the call point is a binomial distribution, with mean = NF = 20, standard deviation = sqrt(NF(1-F)) = sqrt(16) = 4. So the percent of samples that show it will be 20% +/- 4%. So is that accurate? Not really, but has the problem been found? Precisely.

In fact, the larger a problem is, in terms of percent, the fewer samples are needed to locate it. For example, if 3 samples are taken, and a call point shows up on 2 of them, it is highly likely to be very costly. (Specifically, it follows a beta distribution. If you generate 4 uniform 0,1 random numbers, and sort them, the distribution of the 3rd one is the distribution of cost for that call point. It's mean is (2+1)/(3+2) = 0.6, so that is the expected savings, given those samples.) INSERTED: And the speedup factor you get is governed by another distribution, BetaPrime, and its average is 4. So if you take 3 samples, see a problem on 2 of them, and eliminate that problem, on average you will make the program four times faster.

It's high time we programmers blew the cobwebs out of our heads on the subject of profiling.

Disclaimer - the paper failed to reference my article: Dunlavey, “Performance tuning with instruction-level cost derived from call-stack sampling”, ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4-8.

Categories

java - If profiler is not the answer, what other choices do we have?

java - If profiler is not the answer, what other choices do we have?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags