

Streaming UTF-8 (with node.js) - felixge
http://debuggable.com/posts/streaming-utf-8-with-node-js:4bf28e8b-a290-432f-a222-11c1cbdd56cb

======
oscarduignan
What an awesome RSS feed icon they have, if you haven't already, try hovering
over it a few times (and then a few more.)

~~~
riobard
What's the YouTube video about? It says the video is not available.

~~~
oscarduignan
I didn't stick around after reading the title when the youtube page loaded,
however I'm pretty sure it was Rick Astley singing Never Gonna Give You Up.

------
axod
slightly related - quick way to get the number of bytes in a string in
javascript when the string is encoded in utf8:

var utf8_length = encodeURIComponent(data).replace(/%../g, 'x').length;

~~~
felixge
Don't use this, it's broken. Any % sign in your text will screw things. 3-4
byte characters are not handled by it at all.

~~~
axod
Care to elaborate?

% signs are handled fine - they get encoded into %25 which then gets shortened
to a single 'x' character - thus counted as a single byte.

Similarly, for example € gets encoded to "%E2%82%AC" which then gets correctly
counted as 3 bytes.

I'm not quite sure what value of 'broken' you're thinking of? Did you try this
and find it didn't work in some particular circumstance?

~~~
felixge
I didn't know % gets encoded. So you're right, your code does work : )

~~~
axod
FWIW, Checking code or reading API docs is pretty useful :P

firebug works pretty well for testing out code snippets.

The internet is full of enough incorrect information regarding javascript as
it is, and if you think about it, % _has_ to be encoded, otherwise anything
containing a % would be decoded wrongly ;)

------
Sephr
This script is unnecessary, as JavaScript already has functions which can
facillitate encoding and decoding of UTF-8. Encoding UTF-8 to a bytestring is
unescape(encodeURIComponent(string)) and decoding UTF-8 is
decodeURIComponent(escape(string)).

~~~
div
But this script handles streaming utf-8 encoded strings correctly.

------
codehero
code does not work, does not check for invalid UTF-8 characters

~~~
felixge
That is done in Buffer.toString(), this stream filter just makes sure that no
string conversion is attempted on a multibyte character that is still
incomplete (valid or not).

