
Rot8000 - max_sendfeld
http://rot8000.com/Index
======
danbruc
In case the author sees this, some comments about Rotator.cs.

1\. This algorithm will break if the number of valid characters in the BMP
becomes odd.

EDIT: As user platforms pointed out, there is an unit test for this.

2\. There is an overflow in line 39 because of the check i <= BMP_SIZE in line
37.

3\. The web server at rot8000.com exposes at least some errors with stack
traces, try rotating the string <script>.

4\. In line 42 you are performing a linear search for every character you
transform, that is very inefficient, especially with characters at the end of
the BMP. At least use a hash map or even better just use an array mapping the
input code point directly to the output code point.

5\. rot8000.com does at the very least allow rather long inputs which paired
with the inefficiency of the linear search makes a DoS attack pretty easy. I
tried a 10,000 word lorem ipsum, it was not rejected and the request took a
minute to complete.

~~~
rottytooth
Thanks -- added an issue for the linear search
[https://github.com/rottytooth/rot8000/issues/2](https://github.com/rottytooth/rot8000/issues/2)
\-- will place a limit on chars in that textbox as well

~~~
danbruc
For reference, I created an optimized implementation and tested it with a
string containing all characters from U+0000 to U+FFFF in order and got the
following times. The original implementation took 5202.766 ms, the optimized
implementation took 0.079 ms for a speed-up of about 65858. That this is
pretty close to 65536 is probably a reflection of the cost for the linear
search through almost that number of characters and the test pattern I choose
but I am not entirely sure, intuitively I would have expected a factor of 0.5
in there to account for the average case. But I am too lazy right now to do
the math.

~~~
rottytooth
I've updated it to use a hashtable and the tests run quite a lot faster

~~~
danbruc
I took the array approach which should be still faster because it avoids the
hash calculations. Just build an array Char[65536] containing at every index i
the character the character i should be mapped to. Rotator.Rotate() then
simply becomes the following where Rotator.map is the precomputed array.
Probably very similar to an implementation using a hash table. I also got rid
of the string builder but did not profile the difference. If one uses a string
builder it would most likely help to specify the capacity in the constructor
call so that the internal array does not have to be resized repeatedly as the
result is constructed and grows in length.

    
    
      public static String Rotate(String input)
      {
        var result = new Char[input.Length];
    
        for (var index = 0; index < input.Length; index++)
        {
          result[index] = Rotator.map[input[index]];
        }
    
        return new String(result);
      }

------
jstanley
Interesting! I made a very similar tool earlier this year.

It comes with presets for various different areas of Unicode, and some example
text, although the intended use case was very different, I looked at it from a
steganography perspective rather than an honours-system obfuscation
perspective.

[https://incoherency.co.uk/mojibake/](https://incoherency.co.uk/mojibake/)

I initially thought it would be able to decode the rot8000 output without any
modification but I think the utf-8 escaping that my tool expects (from its own
output) gets confused by the output from rot8000.

~~~
brlewis
It may also be that you're rotating by 0x8000 and this code is not. It's
creating a mapping that's restricted to non-control, non-surrogate, non-
whitespace characters and rotating by half the size of that mapping.

[https://github.com/rottytooth/rot8000/blob/master/Rottytooth...](https://github.com/rottytooth/rot8000/blob/master/Rottytooth.Rot8000/Rotator.cs)

~~~
danbruc
This will break, i.e. two consecutive rotations will no longer be the
identity, if the number of valid characters in the BMP ever becomes odd. And
there are still a few unallocated code points in the BMP. There is also an
overflow in line 39 because of the check i <= BMP_SIZE in line 37 which, I
guess, previously used Char.MaxValue instead of BMP_SIZE. But it does no harm
here, U+0000 just gets filtered out twice.

~~~
platforms
There's a test for BMP characters being even:
[https://github.com/rottytooth/rot8000/blob/master/Rottytooth...](https://github.com/rottytooth/rot8000/blob/master/Rottytooth.Rot8000.Tests/MappingsTests.cs)

More critically, if the # of valid chars changes, previously rot-8000'd text
will no longer be reversible through the tool

------
ninjin
This certainly is what I would call a “neat hack”. Out of curiosity I had to
check what it rotates Japanese into. Turns out, mostly Korean: “日本語はどうかな？”
becomes “ື걅갿개갡걀等”.

------
egypturnash
It meticulously refrains from rotating emoji. Somehow this feels like failure.

~~~
SomeCallMeTim
ROT-8000 is only touching the first 65536 Unicode characters (UCS-2). Unicode
has >1M code points. [0]

Most emojis seem to be above the first 16 bits. [1] But there are a number of
emojis in the first 16 bits, like the "frowning face" emoji at U+2639 -- it
rotates just fine -- plus others in the first 16 bits.

(TIL you can't paste emojis into HN comment threads. Probably all for the
best.)

[0]
[https://en.wikipedia.org/wiki/Unicode](https://en.wikipedia.org/wiki/Unicode)

[1] [https://unicode.org/emoji/charts/emoji-
list.html](https://unicode.org/emoji/charts/emoji-list.html)

~~~
have_faith
> TIL you can't paste emojis into HN comment threads. Probably all for the
> best.

I'm gonna make a HN where you can only speak in Emoji!

Sorta unrelated, does anyone remember the social network where you could only
write in Emoji? [http://emoj.li/](http://emoj.li/)

~~~
tyingq
Are the rules for what it does allow written down somewhere? I know country
flags work: 🇩🇪

~~~
codetrotter
If your question isn’t answered, you could post all of them in a comment and
see which ones remain unfiltered.

[https://unicode.org/emoji/charts/full-emoji-
list.html](https://unicode.org/emoji/charts/full-emoji-list.html)

[https://unicode.org/emoji/charts/full-emoji-
modifiers.html](https://unicode.org/emoji/charts/full-emoji-modifiers.html)

You might need to split the text over multiple comments. Don’t remember
whether or not there is a limit to the length a comment can have. Probably
there is.

~~~
tyingq
I may try that. It seems a little arbitrary.

I can post, for example: ↙️ ↩️ ⌚ ⌛ ⌨ ⏏ ⏩ ⏰ ️⏱ ⏲ ⏳ ️◾ 󠁧󠁢󠁷󠁬󠁳󠁿

------
theophrastus
I was curious as to how one might implement this with a familiar language, and
fetched up on this interesting python github script, specifically
"rot32768"[0]

[0]
[https://gist.github.com/terrorbyte/7967039](https://gist.github.com/terrorbyte/7967039)

------
ConcernedCoder
FYI: Here's a static JavaScript version I whipped-up ( as a lunch-time
challenge ) that will reversable rotate everything except whitespace...

[https://github.com/jeffallen6767/rot0x8000](https://github.com/jeffallen6767/rot0x8000)

~~~
rottytooth
Don't see how this will work without checking for control characters,
surrogates and chars above 0x10000 (try 𝄞 for instance)

------
platforms
籝籱籮 籺籾籲籬籴 籫类籸粀籷 籯籸粁 米籾籶籹籼 籸籿籮类 簹粁籁簹簹簹 籭籸籰籼簷
[http://rot8000.com/Index?%E7%B1%9D%E7%B1%B1%E7%B1%AE%20%E7%B...](http://rot8000.com/Index?%E7%B1%9D%E7%B1%B1%E7%B1%AE%20%E7%B1%BA%E7%B1%BE%E7%B1%B2%E7%B1%AC%E7%B1%B4%20%E7%B1%AB%E7%B1%BB%E7%B1%B8%E7%B2%80%E7%B1%B7%20%E7%B1%AF%E7%B1%B8%E7%B2%81%20%E7%B1%B3%E7%B1%BE%E7%B1%B6%E7%B1%B9%E7%B1%BC%20%E7%B1%B8%E7%B1%BF%E7%B1%AE%E7%B1%BB%20%E7%B0%B9%E7%B2%81%E7%B1%81%E7%B0%B9%E7%B0%B9%E7%B0%B9%20%E7%B1%AD%E7%B1%B8%E7%B1%B0%E7%B1%BC%E7%B0%B7)

------
tsaoyu
Reminds me 锟斤拷 due to Unicode replacement character misinterpretation problem.
When placeholder 'U+FFFD' decoded using GBK it will displayed as these
characters. Some of glitches can still be found online, e.g.,
[https://docs.oracle.com/cd/E19199-01/817-4244-10/preface.htm...](https://docs.oracle.com/cd/E19199-01/817-4244-10/preface.html)

------
omarforgotpwd
If you are just starting to get interested in cryptography, try and make a
program that can break ciphers like this one or similar. Hint: Use frequency
analysis on sample ciphertext and compare to known letter frequencies in
english letter to match to plaintext. Then you can determine the offset and
decrypt

------
collyw
Can someone explain what this is doing please?

~~~
Crespyl
See: [http://rot8000.com/info](http://rot8000.com/info)

It's essentially a Unicode version of the old "Rot 13" cypher.

In Rot 13, you translate each letter 13 places down (as if on a code wheel),
such that 'A' becomes 'N', 'B' becomes 'O', wrapping such that 'Z' becomes
'M', and so on.

This version, instead of using the simple 'A=1...Z=26' number space, uses the
Unicode range and rotates by 32,768 (0x8000).

~~~
scrooched_moose
One key aspect you skipped over is it's self-reversible. 'A' becomes 'N', and
applying it again 'N' becomes 'A'.

"rot13 is reversible" -> "ebg13 vf erirefvoyr" -> "rot13 is reversible".

"rot8000 is also reversible" -> "类籸籽籁簹簹簹 籲籼 籪籵籼籸 类籮籿籮类籼籲籫籵籮" -> "rot8000 is
also reversible"

Rot13 is English-alphabet only so it skips numbers, while rot8000 doesn't have
this limitation because it uses the larger unicode set.

------
TazeTSchnitzel
Reminds me of the infamous 畂桳栠摩琠敨映捡獴.

~~~
Arkanosis
For anyone not getting it:
[https://en.wikipedia.org/wiki/Bush_hid_the_facts](https://en.wikipedia.org/wiki/Bush_hid_the_facts)

------
supakeen
Fun, but outputs unprintable or non-used characters and only functions on the
BMP?

------
loa_in_
Reminds me of
[http://base91.sourceforge.net/](http://base91.sourceforge.net/).

We could go further, straight to Base8000!

~~~
ar-nelson
Already exists:
[https://github.com/qntm/base65536](https://github.com/qntm/base65536)

It's actually pretty useful for compressing data in Unicode-aware
environments, like Twitter. Which makes me wonder if Unicode support is
universal enough now that an encoding like this could replace MIME/base64 in
email.

------
dullroar
Should also change spaces to zero-width spaces, which would then make it less
obvious where the word breaks are.

------
dana321
籖粂 籶籸籽籱籮类 籪籽籮 粂籸籾类 籬籱籲粀籸粀籸粀籸粀粀

------
tuttle7
Noone is concerned by the fact this is sending your text using POST requests.
The guy could not use DOM/JS.

~~~
Crespyl
No, no one is concerned by this. Not every toy website needs to have JS.

~~~
Sohcahtoa82
I think the point the tuttle7 was trying to make was that this site could be
implemented client-side quite easily. There's no real reason to make the
translation server-side and require more server CPU resources and bandwidth.

I feel the same way about
[https://www.base64decode.org/](https://www.base64decode.org/) . By default,
everything gets translated server-side. I wonder how many people use this site
on a regular basis for translating secrets. I'd bet my life that the number is
greater than zero.

