Jak zwykle pliterki

Posted on Sat 22 September 2012 in Pamietniczek • 1 min read

Tyle to już lat, a ja wciąż walczę z głupi pliterkami. Okazuje się, że MySQL trzyma sobie zakodowany UTF-8 tekst we własny sposób:

"That is the way MySQL stores utf8 encoded data internally. It's a terribly inefficient variation of Unicode storage, apparently using a full three bytes for most characters, and not supporting four byte UTF-8 sequences. As for how to convert it to real UTF-8 using INTO OUTFILE... I don't know. Using other mysqldump methods will do it though."

Znalazłem metodę dzięki dyskusji:

"Ahâ€¦ mmmâ€¦ So, out of curiosity, how does MySQL encode unicode data internally?"

"I wish I knew. I poked around the documentation when writing this answer, but couldn't come up with anything specific. It's not UCS-2, it's not UTF-8, it's not UTF-16. I just have lingering passive knowledge that MySQL's "UTF-8" storage is not UTF-8 and not very optimized. Might be worth opening a new question for." "So, it looks like the documentation is lying (or, at least, misleading). @taavi seems to have found the answer â€” MySQL's â€œlatin1â€? is actually cp1252, so MySQL is decoding the text as cp1252, then encoding it as utf-8. Awesome!"

You might enjoy