
MySQL utf8 vs. utf8mb4 – What's the difference? - pavel_lishin
http://www.eversql.com/mysql-utf8-vs-utf8mb4-whats-the-difference-between-utf8-and-utf8mb4/
======
jakobegger
MySQL has caused so much pain with their decision to make latin1 the default
encoding.

I've seen so many PHP apps that store UTF-8 encoded text in Latin1 fields, and
it kind of works, until you correctly configure your client library, or try
sorting the data, or check for equality...

The problem is so common, that the popular MySQL client app Sequel Pro even
has an option to set the encoding to "UTF8 via Latin1", which tells the server
to output Latin1 but then treats the result as UTF-8.

So once you learned this the hard way, you'll change the db encoding to utf8,
and you'll be glad to have finally fixed the encoding mess, only to stumble
across the fact from the original article that utf8 is not really utf8 in
MySQL --- oh the horror!

------
VintageCool
Use utf8mb4. Use utf8mb4_unicode_520_ci. Alter tables using

ALTER TABLE database_name.table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE
utf8mb4_unicode_520_ci.

Verify that all of your columns are utf8mb4 with:

SELECT table_schema, table_name, character_set_name, collation_name FROM
information_schema.columns WHERE table_schema NOT IN
('mysql','information_schema','sys','performance_schema') AND
character_set_name is not NULL AND character_set_name != 'utf8mb4';

You may have to update your client libraries (like Perl-DBD) if you're still
using something like CentOS 6.

------
mixologic
If only they had named them utf8mb3 and utf8. Would have been a _lot_ clearer
that one was optimized, but not fully utf8.

------
blaisio
The next problem you'll likely encounter after switching to utf8mb4 is the
relatively tiny maximum index key length in InnoDB - even though the strings
are variable length, for the utf8 4 encoding MySQL will multiply the length of
the string by 4 when checking against this limit.

If your data model layer supports it, it's often better to just use raw binary
instead of any particular encoding. The main features utf8mb4 provides are
validation and sorting. Most people are just using an ORM and use Unicode
internally in an application, so validation of the encoding is nice, but not
essential. And for most applications it doesn't make sense to sort arbitrary
utf8 strings in the database layer. (Sorting Unicode strings in general is
very complicated, and 99% of the time it's not something you really need.)

------
thephyber
It seems insane that the top Google SERP result for "mysql utf8"[1] mentions
this, but not in the top answer, rather in the top comment to the top answer
-- which describes why the top answer is wrong.

Because of the way MySQL rolled out `utf8` (a strict subset of UTF8) then
`utf8mb4` (which is a full UTF8 implementation), the other top result[2] is
similarly poisoned where the directions describe using `utf8` and have an
addendum describing `utf8mb4` (which isn't hard to miss).

[1] [https://stackoverflow.com/questions/202205/how-to-make-
mysql...](https://stackoverflow.com/questions/202205/how-to-make-mysql-handle-
utf-8-properly)

[2] [https://www.toptal.com/php/a-utf-8-primer-for-php-and-
mysql](https://www.toptal.com/php/a-utf-8-primer-for-php-and-mysql)

------
petraeus
SELECT CONCAT ( 'ALTER TABLE ', t1.TABLE_SCHEMA, '.', t1.table_name, ' MODIFY
', t1.column_name, ' ', t1.data_type, '(' , CHARACTER_MAXIMUM_LENGTH, ')', '
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;' ) FROM
information_schema.columns t1

WHERE t1.TABLE_SCHEMA like 'omtest' AND t1.COLLATION_NAME IS NOT NULL AND
t1.COLLATION_NAME NOT IN ('utf8mb4_unicode_ci')

AND table_name = 'emails';

------
cntlzw
Never heard of this. Thanks for the article. So many pitfalls working with
MySQL.

------
mdani
utf8 to utf8mb4 also breaks replication so be extra careful if you have slaves
replicating. You'd need to switch off row or mixed replication and switch to
statement based till changes are done on all nodes.

