Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
149 views
in Technique[技术] by (71.8m points)

mysql5 - MySQL database with unique fields ignored ending spaces

My projects requires to start inputs from the user with the spacing on the left and spacing on the right of a word, for example 'apple'. If the user types in ' apple' or 'apple ', whether it is one space or multiple space on the left or right of the word, I need to store it that way.

This field has the Unique attribute, but I attempt to insert the word with spacing on the left, and it works fine. But when I attempt to insert the word with spacing on the right it trims off all the spacing from the right of the word.

So I am thinking of adding a special character to the right of the word after the spacing. But I am hoping there is a better solution for this issue.

CREATE TABLE strings
( id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
string varchar(255) COLLATE utf8_bin NOT NULL,
created_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (id), UNIQUE KEY string (string) )
ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem is that MySQL ignores trailing whitespace when doing string comparison. See http://dev.mysql.com/doc/refman/5.7/en/char.html

All MySQL collations are of type PADSPACE. This means that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces.

...

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters will result in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

(This information is for 5.7; for 8.0 this changed, see below)

The section for the like operator gives an example for this behavior (and shows that like does respect trailing whitespace):

mysql> SELECT 'a' = 'a ', 'a' LIKE 'a ';
+------------+---------------+
| 'a' = 'a ' | 'a' LIKE 'a ' |
+------------+---------------+
|          1 |             0 |
+------------+---------------+
1 row in set (0.00 sec)

Unfortunately the UNIQUE index seems to use the standard string comparison to check if there is already such a value, and thus ignores trailing whitespace. This is independent from using VARCHAR or CHAR, in both cases the insert is rejected, because the unique check fails. If there is a way to use like semantics for the UNIQUE check then I do not know it.

What you could do is store the value as VARBINARY:

mysql> create table test_ws ( `value` varbinary(255) UNIQUE );
Query OK, 0 rows affected (0.13 sec)

mysql> insert into test_ws (`value`) VALUES ('a');
Query OK, 1 row affected (0.08 sec)

mysql> insert into test_ws (`value`) VALUES ('a ');
Query OK, 1 row affected (0.06 sec)

mysql> SELECT CONCAT( '(', value, ')' ) FROM test_ws;
+---------------------------+
| CONCAT( '(', value, ')' ) |
+---------------------------+
| (a)                       |
| (a )                      |
+---------------------------+
2 rows in set (0.00 sec)

You better do not want to do anything like sorting alphabetically on this column, because sorting will happen on the byte values instead, and that will not be what the users expect (most users, anyway).

The alternative is to patch MySQL and write your own collation which is of type NO PAD. Not sure if someone wants to do that, but if you do, let me know ;)

Edit: meanwhile MySQL has collations which are of type NO PAD, according to https://dev.mysql.com/doc/refman/8.0/en/char.html :

Most MySQL collations have a pad attribute of PAD SPACE. The exceptions are Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD.

and https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

Unicode collations based on UCA versions later than 4.0.0 include the version in the collation name. Thus, utf8mb4_unicode_520_ci is based on UCA 5.2.0 weight keys, whereas utf8mb4_0900_ai_ci is based on UCA 9.0.0 weight keys.

So if you try:

  create table test_ws ( `value` varbinary(255) UNIQUE )
    character set utf8mb4 collate utf8mb4_0900_ai_ci;

you can insert values with and without trailing whitespace

You can find all available NO PAD collations with:

 show collation where Pad_attribute='NO PAD';

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...