unicode - MySQL collation to store multilingual data of unknown language

Question

Welcome To Ask or Share your Answers For Others

unicode - MySQL collation to store multilingual data of unknown language

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:38:46+0000

You should use a Unicode collation. You can set it by default on your system, or on each field of your tables. There are the following Unicode collation names, and this are their differences:

utf8_general_ci is a very simple collation. It just - removes all accents - then converts to upper case and uses the code of this sort of "base letter" result letter to compare.

utf8_unicode_ci uses the default Unicode collation element table.

The main differences are:

utf8_unicode_ci supports so called expansions and ligatures, for example: German letter ? (U+00DF LETTER SHARP S) is sorted near "ss" Letter ? (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

utf8_general_ci does not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in the wrong order.

utf8_unicode_ci is generally more accurate for all scripts. For example, on Cyrillic block: utf8_unicode_ci is fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian are not sorted well.

+/- The disadvantage of utf8_unicode_ci is that it is a little bit slower than utf8_general_ci.

So depending on, if you know or not, which specific languages/characters you are going to use I do recommend that you use utf8_unicode_ci which has a more ample coverage.

^{Extracted from MySQL forums.}

Categories

unicode - MySQL collation to store multilingual data of unknown language

unicode - MySQL collation to store multilingual data of unknown language

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags