Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
500 views
in Technique[技术] by (71.8m points)

floating point - MySQL "greater than" condition sometimes returns row with equal value

I'm running into a baffling issue with a basic MySQL query.

This is my table:

id | rating
1  | 1317.17
2  | 1280.59
3  | 995.12
4  | 973.88

Now, I'm attempting to find all rows where the rating column is larger than a certain value. If I try the following query:

SELECT * FROM (`users`) WHERE `rating` > '995.12'

It correctly returns 2.

But, if I try

SELECT * FROM (`users`) WHERE `rating` > '973.88'

it returns 4! So it's as if it thinks the 973.88 in the table is greater than 973.88, but it doesn't make the same mistake with 995.12. This happens regardless of whether I run the query from a PHP script or in phpMyAdmin.

Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Decisions and consequences

This is the consequences that you've got because you decided to use floating-point data type. Floats are not precise. And that means: yes, you can result in a>a = true

For instance, your fourth row:

mysql> SELECT *  FROM t WHERE id=4;
+------+--------+
| id   | rating |
+------+--------+
|    4 | 973.88 |
+------+--------+
1 row in set (0.00 sec)

I've left data type as you've posted, it's FLOAT. Here we are:

mysql> SELECT rating>973.88 FROM t WHERE id=4;
+---------------+
| rating>973.88 |
+---------------+
|             1 |
+---------------+
1 row in set (0.00 sec)

Oops!


Behind the screen

Why? To understand why it is so, you should realize how floating-point data type is represented. Long story is here. But - I'll take a brief overview.

Here how it is represented: enter image description here where:

  • s is the sign
  • b is the base. It's meaning is same as radix
  • e is the exponent.

That means we can represent one number in different ways - and that depends of which base we'll chose. Most common is b=2. But not all real numbers can be represented exactly with this base, even if in decimal base they look "good". Famous example is 0.1 - which can not be represented in b=2 precisely - so it is stored approximately. Again, long story you can see here - but I'll just note, that it's impossible to represent it precisely with base 2.

The result is: even if number is precise in decimal radix, it still may be impossible to represent it precisely - and, therefore, it will be stored approximately. That's how it works and, in fact, this is intended - because of structure of floats itself.


What to do

Fixed precision

Well, first, you should ask yourself: do you really need float? Attention: I said: float. Because - there are also fixed point numbers. They will represent number with fixed precision. To say it easy: with fixed-point data type you may be sure that you'll store exactly what you see on the screen. So if it's 973.88 - then it's 973.88 and not 973.8800000439234. Moving to the deal:

mysql> ALTER TABLE t CHANGE rating rating DECIMAL(8,2);
Query OK, 4 rows affected, 4 warnings (0.47 sec)
Records: 4  Duplicates: 0  Warnings: 4

and ..

mysql> SELECT rating>973.88 FROM t WHERE id=4;
+---------------+
| rating>973.88 |
+---------------+
|             0 |
+---------------+
1 row in set (0.00 sec)

TADA! Magic happens. Your number is now stored with fixed precision, thus, such comparison failed.

Using float

Then, may be there are use-cases when you're stuck with floats (however, in case of DBMS it's hard for me to remember even one such use-case - if only not the case with large amount of calculations, which may cause performance impact, see description below). Then there's still a way to make it work. You should decide what precision is applicable for you. That is: from which point will you treat numbers as equals.

You're storing only two significant digits, so I assume that precision of 1E-5 would be more than enough. Then, your query will look like:

mysql> set @eps=1E-5;
Query OK, 0 rows affected (0.00 sec)

and use it with:

SELECT * FROM t WHERE rating>973.88+@eps

which will result in

+------+---------+
| id   | rating  |
+------+---------+
|    1 | 1317.17 |
|    2 | 1280.59 |
|    3 |  995.12 |
+------+---------+

Which is better?

To realize this, you'll need to look under cover once again. I've given a brief overview of what float data type is and why it isn't precise. However, fixed data type also has it's weakness. May be it's not the thing of which we should worry in context of DBMS, but I'll mention it: fixed data type, in general, will cause performance impact. And it will depend of how much calculations will you do in your DBMS.

In MySQL, fixed-point data types (such as DECIMAL) are implemented as BCD strings (so to make long story short - again, here's wiki link). That means in comparison to float it will cause performance issues. But if you're not going to do calculations in DBMS too often, then that impact won't be even noticeable - I've mentioned it because both types, float- and fixed-point have their own issues.


Conclusion

DBMS, like all the other computer stuff, isn't perfect. It's just using some internal things to do the work. That means: in some cases you'll have to realize how that internal things work to understand why did you got some odd result.

In particular, floats are not precise. Yes, there are tons of answers like this in the Internet, but I'll repeat. They are not precise. You should not rely on precision when it's about floats. And - in almost all DBMS there are fixed-point data types. And - in cases like yours you should use them. They will do just same work, but with them you'll be sure about selected precision.

However, you may want to use floats - if you're going to make too much calculations in your DBMS. But, on the other hand, that is about - why are you going to do that? Why do not use application to produce those calculations (and, therefore, avoid both performance impact of using fixed-point data-types and presicion problems with floats - because using fixed-point with average amount of calculations is ok)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...