

Rot8000 – Rot13 for the Unicode generation - rottytooth
http://rot8000.com

======
lelf
It's broken.

Λ̊1 → ⊻∪ά → Λ̊⋌

𝄞 → 뤔뷾 → 駴點

Edit: anyway, even with correct (a+b)%n it's plain bad idea.

Unicode is not English alphabet. Everything not in basic multilingual plane is
broken automatically. And even in BMP there's going to be bag of glitches
starting from hanging combining characters and ending to ‘oops someone
normalised our string and it's now different’ (for site, not for user /
Unicode).

~~~
bouk
Pretty sure it's meant as a joke

~~~
rottytooth
It is meant as a joke -- but also planning to fix these issues ... apparently
this was the right place to bring it to find all the situations where it
doesn't work correctly :)

~~~
derefr
Rather than rotating through the entire BMP, I would suggest instead using
Unicode's localized collations, and just rotating every character that's part
of a fully-orderable "alphabet" set through that set according to those
orders. (This means, for example, rotating Japanese hiragana, but not kanji.)

------
aculver
Inputting "こんにちは。元気ですか？" caused an application error:

    
    
        [ArgumentException: Error serializing value 'ᄳᅳᅋᅁᅏტ㈣䳷ᅇᄹᄫ�' of type 'System.String.']
    

After realizing it was "？" that was breaking everything, I ended up with this
round trip:

"こんにちは。元気ですか。" → "ᄳᅳᅋᅁᅏტ㈣䳷ᅇᄹᄫტ" → "こんにちは。ጃ⷗ですか。"

It's broken. I suspect Unicode requires more careful manipulation than OP
anticipated. :-)

------
mischanix
Not reciprocal for CJK input, e.g. "한글" takes 5 iterations to reach stability.
I believe this has to do with the utf-16 encoding of codepoints > 0x10000

~~~
lelf
한글 is in basic plane. It's U+D55C U+AE00

~~~
mischanix
I was considering the fact that when it adds 0x8000 or whatever it's doing
it's hitting 0x1.... codepoints and doing weird things with those because of
the encoding. Here's a trace of 한글 through this 'rot8000', though:

한글: 0xd55c 0xae00 똼軠: 0xb63c 0x8ee0 霜激: 0x971c 0x6fc0 矼傠: 0x77fc 0x50a0 壜ㆀ:
0x58dc 0x3180 㦼በ: 0x39bc 0x1260 ᪜ㆀ: 0x1a9c 0x3180 㦼በ: (repeating)

So... yeah. Weirdness all around. Might have better luck doing this with some
carefully crafted xor pad for each codepoint so that it's likely to hit a
printable character but impossible to hit a character in the 0xD800..0xDFFF
range (and similar ranges)... trying to "wrap" in unicode would require
reinterpreting the codepoints to some continuous numeric representation.

------
peterwaller
Copy-pasting the contents of rot8000.com/info in and hitting cypher twice ends
up scrambling the contents quite a bit..

    
    
      It also bypasses 32 control characters, technically making it rot7968, sometimes with an additional offset.
    

->
    
    
      It also bypasses ⋍2 control characters, technically making it rot⋏⋬68, sometimes with an additional offset.

~~~
rottytooth
hmm, I'm not seeing this result

------
rottytooth
I put in a fix for CJK and the result is: nearly everything that's not CJK now
rotates into it and back out; CJK is an _huge_ section of the Basic
Multilingual Plane. The fix invalidates rotations done with rot8000 before the
fix, unfortunately.

------
njharman
I just realized that 13 was probably chosen for rot13 cause that's half the
number of letters in English alphabet.

I miss "obvious" stuff like that all the time.

------
jloughry
Why not call it Rot8192 or Rot0x7777 ?

~~~
throwaway0094
rot13 is (X + (26/2)) mod 26 ; this is (X + (2^16)/2) mod 2^16. (The BMP is
the first 2^16 code points of unicode.)

Edit: silly formatting dropped my math punctuation.

~~~
CUViper
I suspect they also really wanted the rot8 -> "rotate" joke.

AFAICS, it's actually using decimal 8000, not 2^16/2 = 0x8000, so I don't
really understand how this is reversible at all unless they're just
subtracting it back.

What we really need is rot88000h for the full U+0..U+10FFFF range. :)

~~~
rottytooth
It's using 0x8000, which is half of 0x10000 (the size of the basic
multilingual plane). It doesn't extend out of BMP

