Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
206 views
in Technique[技术] by (71.8m points)

glibc - IEEE-754 floating-point precision: How much error is allowed?

I'm working on porting the sqrt function (for 64-bit doubles) from fdlibm to a model-checker tool I'm using at the moment (cbmc).
As part of my doings, I read a lot about the ieee-754 standard, but I think I didn't understand the guarantees of precision for the basic operations (incl. sqrt).

Testing my port of fdlibm's sqrt, I got the following calculation with sqrt on a 64-bit double:

sqrt(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0) = 44464159913633855548904943164666890000299422761159637702558734139742800916250624.0

(this case broke a simple post-condition in my test regarding precision; I'm not sure anymore if this post-condition is possible with IEEE-754)

For a comparison, several multi-precision tools calculated something like:

sqrt(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0) =44464159913633852501611468455197640079591886932526256694498106717014555047373210.truncated

One can see, that the 17-th number from the left is different, meaning an error like:

3047293474709469249920707535828633381008060627422728245868877413.0

Question 1: Is this huge amount of error allowed?

The standard is saying that every basic operation (+,-,*,/,sqrt) should be within 0.5 ulps, meaning that it should be equal to a mathematically exact result rounded to the nearest fp-representation (wiki is saying that some libraries only guarantees 1 ulp, but that isn't that important at the moment).

Question 2: Does that mean, that every basic operation should have an error < 2.220446e-16 with 64-bit doubles (machine-epsilon)?

I did calculate the same with a x86-32 linux system (glibc / eglibc) and got the same result like that obtained with fdlibm, which let me think that:

  • a: I did something wrong (but how: printf would be a candidate, but I don't know if that could be the reason)
  • b: the error/precision is common in these libraries
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The IEEE-754 standard requires that so called "basic operations" (which include addition, multiplication, division and square root) are correctly rounded. This means that there is a unique allowed answer, and it is the closest representable floating-point number to the so-called "infinitely precise" result of the operation.

In double-precision, numbers have 53 binary digits of precision, so the correct answer is the exact answer rounded to 53 significant digits. As Rick Regan showed in his answer, this is exactly the result that you got.

The answers to your questions are:

Question 1: Is this huge amount of error allowed?

Yes, but it is quite misleading to call this error "huge". The fact is that there is no double-precision value that could be returned that would have a smaller error.

Question 2: Does that mean, that every basic operation should have an error < 2.220446e-16 with 64-bit doubles (machine-epsilon)?

No. It means that every basic operation should be rounded to the (unique) closest representable floating-point number according to the current rounding mode. This is not quite the same as saying that the relative error is bounded by machine epsilon.

Question 3: Which result do you obtain with your x86 hardware and gcc + libc?

The same answer you did, because sqrt is correctly rounded on any reasonable platform.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...