Hacker News new | past | comments | ask | show | jobs | submit login
Rot8000 (rot8000.com)
158 points by max_sendfeld on Nov 20, 2018 | hide | past | favorite | 54 comments

In case the author sees this, some comments about Rotator.cs.

1. This algorithm will break if the number of valid characters in the BMP becomes odd.

EDIT: As user platforms pointed out, there is an unit test for this.

2. There is an overflow in line 39 because of the check i <= BMP_SIZE in line 37.

3. The web server at rot8000.com exposes at least some errors with stack traces, try rotating the string <script>.

4. In line 42 you are performing a linear search for every character you transform, that is very inefficient, especially with characters at the end of the BMP. At least use a hash map or even better just use an array mapping the input code point directly to the output code point.

5. rot8000.com does at the very least allow rather long inputs which paired with the inefficiency of the linear search makes a DoS attack pretty easy. I tried a 10,000 word lorem ipsum, it was not rejected and the request took a minute to complete.

Thanks -- added an issue for the linear search https://github.com/rottytooth/rot8000/issues/2 -- will place a limit on chars in that textbox as well

For reference, I created an optimized implementation and tested it with a string containing all characters from U+0000 to U+FFFF in order and got the following times. The original implementation took 5202.766 ms, the optimized implementation took 0.079 ms for a speed-up of about 65858. That this is pretty close to 65536 is probably a reflection of the cost for the linear search through almost that number of characters and the test pattern I choose but I am not entirely sure, intuitively I would have expected a factor of 0.5 in there to account for the average case. But I am too lazy right now to do the math.

I've updated it to use a hashtable and the tests run quite a lot faster

I took the array approach which should be still faster because it avoids the hash calculations. Just build an array Char[65536] containing at every index i the character the character i should be mapped to. Rotator.Rotate() then simply becomes the following where Rotator.map is the precomputed array. Probably very similar to an implementation using a hash table. I also got rid of the string builder but did not profile the difference. If one uses a string builder it would most likely help to specify the capacity in the constructor call so that the internal array does not have to be resized repeatedly as the result is constructed and grows in length.

  public static String Rotate(String input)
    var result = new Char[input.Length];

    for (var index = 0; index < input.Length; index++)
      result[index] = Rotator.map[input[index]];

    return new String(result);

Limiting the text box will protect you against the most naive DoS attacks, but you need some kind of limit at the API level (request size, etc.). Never trust the client.

not a js guy - any reason this needs any lookup table at all?

quick googling seems to suggest simple bitshifting could be possible..

Because the basic multilingual plane that it operates on isn't actually full.

Interesting! I made a very similar tool earlier this year.

It comes with presets for various different areas of Unicode, and some example text, although the intended use case was very different, I looked at it from a steganography perspective rather than an honours-system obfuscation perspective.


I initially thought it would be able to decode the rot8000 output without any modification but I think the utf-8 escaping that my tool expects (from its own output) gets confused by the output from rot8000.

It may also be that you're rotating by 0x8000 and this code is not. It's creating a mapping that's restricted to non-control, non-surrogate, non-whitespace characters and rotating by half the size of that mapping.


This will break, i.e. two consecutive rotations will no longer be the identity, if the number of valid characters in the BMP ever becomes odd. And there are still a few unallocated code points in the BMP. There is also an overflow in line 39 because of the check i <= BMP_SIZE in line 37 which, I guess, previously used Char.MaxValue instead of BMP_SIZE. But it does no harm here, U+0000 just gets filtered out twice.

There's a test for BMP characters being even: https://github.com/rottytooth/rot8000/blob/master/Rottytooth...

More critically, if the # of valid chars changes, previously rot-8000'd text will no longer be reversible through the tool

I also had a similar idea a few years ago for a CTF challenge, coming at it from a "modern Caesar cipher" perspective: https://laurencetennant.com/unicode-shift-cipher

Also a crude "modern Bacon cipher" using Punycode characters as the B's and ASCII-range characters as the A's.


This certainly is what I would call a “neat hack”. Out of curiosity I had to check what it rotates Japanese into. Turns out, mostly Korean: “日本語はどうかな?” becomes “ື걅갿개갡걀等”.

It meticulously refrains from rotating emoji. Somehow this feels like failure.

ROT-8000 is only touching the first 65536 Unicode characters (UCS-2). Unicode has >1M code points. [0]

Most emojis seem to be above the first 16 bits. [1] But there are a number of emojis in the first 16 bits, like the "frowning face" emoji at U+2639 -- it rotates just fine -- plus others in the first 16 bits.

(TIL you can't paste emojis into HN comment threads. Probably all for the best.)

[0] https://en.wikipedia.org/wiki/Unicode

[1] https://unicode.org/emoji/charts/emoji-list.html

> TIL you can't paste emojis into HN comment threads. Probably all for the best.

It got in the way of me explaining the Rebus principle used in Egyptian hieroglyphs a little while ago.

Then again, the topic was phalluses in Unicode (which do display!), so maybe you're right.

> TIL you can't paste emojis into HN comment threads. Probably all for the best.

I'm gonna make a HN where you can only speak in Emoji!

Sorta unrelated, does anyone remember the social network where you could only write in Emoji? http://emoj.li/

Are the rules for what it does allow written down somewhere? I know country flags work: 🇩🇪

If your question isn’t answered, you could post all of them in a comment and see which ones remain unfiltered.



You might need to split the text over multiple comments. Don’t remember whether or not there is a limit to the length a comment can have. Probably there is.

I may try that. It seems a little arbitrary.

I can post, for example: ↙️ ↩️ ⌚ ⌛ ⌨ ⏏ ⏩ ⏰ ️⏱ ⏲ ⏳ ️◾ 󠁧󠁢󠁷󠁬󠁳󠁿

There is a wonderful talk by the founders of emoj.li about how it was all a joke which got out of hand. https://www.youtube.com/watch?v=GsyhGHUEt-k

It seems you can type them in natively, though

edit: scratch that, they get stripped out.

I was curious as to how one might implement this with a familiar language, and fetched up on this interesting python github script, specifically "rot32768"[0]

[0] https://gist.github.com/terrorbyte/7967039

FYI: Here's a static JavaScript version I whipped-up ( as a lunch-time challenge ) that will reversable rotate everything except whitespace...


Don't see how this will work without checking for control characters, surrogates and chars above 0x10000 (try 𝄞 for instance)

籝籱籮 籺籾籲籬籴 籫类籸粀籷 籯籸粁 米籾籶籹籼 籸籿籮类 簹粁籁簹簹簹 籭籸籰籼簷 http://rot8000.com/Index?%E7%B1%9D%E7%B1%B1%E7%B1%AE%20%E7%B...

Reminds me 锟斤拷 due to Unicode replacement character misinterpretation problem. When placeholder 'U+FFFD' decoded using GBK it will displayed as these characters. Some of glitches can still be found online, e.g., https://docs.oracle.com/cd/E19199-01/817-4244-10/preface.htm...

If you are just starting to get interested in cryptography, try and make a program that can break ciphers like this one or similar. Hint: Use frequency analysis on sample ciphertext and compare to known letter frequencies in english letter to match to plaintext. Then you can determine the offset and decrypt

Can someone explain what this is doing please?

See: http://rot8000.com/info

It's essentially a Unicode version of the old "Rot 13" cypher.

In Rot 13, you translate each letter 13 places down (as if on a code wheel), such that 'A' becomes 'N', 'B' becomes 'O', wrapping such that 'Z' becomes 'M', and so on.

This version, instead of using the simple 'A=1...Z=26' number space, uses the Unicode range and rotates by 32,768 (0x8000).

One key aspect you skipped over is it's self-reversible. 'A' becomes 'N', and applying it again 'N' becomes 'A'.

"rot13 is reversible" -> "ebg13 vf erirefvoyr" -> "rot13 is reversible".

"rot8000 is also reversible" -> "类籸籽籁簹簹簹 籲籼 籪籵籼籸 类籮籿籮类籼籲籫籵籮" -> "rot8000 is also reversible"

Rot13 is English-alphabet only so it skips numbers, while rot8000 doesn't have this limitation because it uses the larger unicode set.

The only link on the page links to the explanation.

I missed that - meow_info doesn't really convey that its an explanation have the same noticeaion.

Reminds me of the infamous 畂桳栠摩琠敨映捡獴.

Fun, but outputs unprintable or non-used characters and only functions on the BMP?

Reminds me of http://base91.sourceforge.net/.

We could go further, straight to Base8000!

Already exists: https://github.com/qntm/base65536

It's actually pretty useful for compressing data in Unicode-aware environments, like Twitter. Which makes me wonder if Unicode support is universal enough now that an encoding like this could replace MIME/base64 in email.

Okay, I have seen this 10 times or so when I tried to compare various binary-to-text encodings and basE91 is the only one without a format description. Probably it's time to directly look at the source code. Amazingly, this one turns out to be the only binary-to-text encoding with the input bits groupped by varying number of bits I have ever seen. More specifically:

* The input bits are packed in the reverse order (e.g. 1A 2B 3C is packed as 0x3C2B1A) unlike most other binary-to-text encodings. The last bits are padded with preceding zeroes.

* A pair of basE91 alphabets encode a number 0 through 8280. The first alphabet is least significant: `AB` encodes 91 and not 1.

* 91^2 = 8281 > 2^13 = 8192, so groups of 13 bits are read and encoded as two basE91 alphabets from the least significant to the most significant. But it's not always the case. Occasionally a group of lowermost 14 bits will be read if the bits are less than 91^2. As a result, the first 8281 - 8192 = 89 values (0..88) and the last 89 values (8192..8280) actually encode 14 bits, and it includes all-zero bits. Its average overhead is therefore 22.93% (16 / lg 8281 - 1) and can reach 14.29% (16 / 14 - 1) when all bits are zero.

It reminds me of Ascii85 [1] which had a shorthand for all-zero groups and all-space groups, but this one is more general. Speaking of generality, probably a binary-to-text encoding with arithmetic coding is now viable?

[1] https://en.wikipedia.org/wiki/Ascii85#btoa_version

Should also change spaces to zero-width spaces, which would then make it less obvious where the word breaks are.

籖粂 籶籸籽籱籮类 籪籽籮 粂籸籾类 籬籱籲粀籸粀籸粀籸粀粀

Noone is concerned by the fact this is sending your text using POST requests. The guy could not use DOM/JS.

No, no one is concerned by this. Not every toy website needs to have JS.

I think the point the tuttle7 was trying to make was that this site could be implemented client-side quite easily. There's no real reason to make the translation server-side and require more server CPU resources and bandwidth.

I feel the same way about https://www.base64decode.org/ . By default, everything gets translated server-side. I wonder how many people use this site on a regular basis for translating secrets. I'd bet my life that the number is greater than zero.

Nah bro, it needs Webpack and a mishmash of Angular and Vue with a "sprinkling" of React along with an Elixir backend so it's fault-tolerant. Else, how is this toy site supposed to scale at all?

I'd be more concerned if you used this for actual secrets.

That's why you write rude messages to give him/her a laugh when checking server logs.

Yep, passed the test. There is then checks made on contents send. A little warning on how the send data is handled would have been appreciated. Thank you.

Why on earth would you need a warning that text entered into an HTML form would be posted to the server when you pressed the button? What else would you expect it to do?

I'd expect such a trivial operation to be done client-side in JavaScript and not need to ask a server to do it for them.

I was delighted that a fun toy didn't need JS for once. If the concern is one of privacy, the author could just be sending the text to the server in the background with JS too.

If i can carry a bucket on my shoulder, why get a car to move it?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact