c++ - float128 and double-double arithmetic

Question

Welcome To Ask or Share your Answers For Others

c++ - float128 and double-double arithmetic

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:33:46+0000

“In this case, we use two double to store the value. So we need to make two operations at each time.”

This is not how double-double arithmetic works. You should expect one double-double operation to be implemented in anywhere from 6 to 20 double operations depending on the actual operation being implemented, the availability of a fused-multiply-add operation, the assumption that one operand is larger than the other, …

For instance, here is one implementation of a double-double multiplication for when an FMA instruction is not available, taken from CRlibm:

#define Mul22(zh,zl,xh,xl,yh,yl)                      
{                                                     
double mh, ml;                                        
                              
  const double c = 134217729.;                
  double up, u1, u2, vp, v1, v2;              
                              
  up = (xh)*c;        vp = (yh)*c;            
  u1 = ((xh)-up)+up;  v1 = ((yh)-vp)+vp;          
  u2 = (xh)-u1;       v2 = (yh)-v1;                   
                              
  mh = (xh)*(yh);                     
  ml = (((u1*v1-mh)+(u1*v2))+(u2*v1))+(u2*v2);        
                              
  ml += (xh)*(yl) + (xl)*(yh);                
  *zh = mh+ml;                        
  *zl = mh - (*zh) + ml;                              
}

The first 8 operations alone are for dividing exactly each double from the operands into two halves so that one half from each side can be multiplied with one half from the other side and the result obtained exactly as a double. The computations u1*v1, u1*v2, … do exactly that.

The values obtained in mh and ml can overlap, so the last 3 operations are there to renormalize the result into the sum of two floating-point numbers.

In this case we can have round-off errors on each double or their is a mechanism that avoid this?

As the comment says:

/*
 * computes double-double multiplication: zh+zl = (xh+xl) *  (yh+yl)
 * relative error is smaller than 2^-102
 */

You can find about all the mechanisms used to achieve these results in the Handbook of Floating-Point Arithmetic.

Categories

c++ - float128 and double-double arithmetic

c++ - float128 and double-double arithmetic

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags