ROT8000

jstanley · on Sept 22, 2021

Interesting, I have made a similar project, except instead of rotN, it encodes the input as UTF-8, and then shifts up the codepoints to display each byte as a different character to what it would normally be. The invariant is that `byte & 0xff` is the real byte value.

I call it "Mojibake Steganography": https://incoherency.co.uk/mojibake/

I think in principle (judging by the description of rot8000), my tool should be able to decode rot8000 messages natively, but it doesn't seem to work on the example given here. From looking directly at the codepoints given, I think the example is wrong. It starts:

u+7c5d u+7c71 u+7c6e - which works out to "]qn" instead of "The", unless I am misunderstanding something. And in fact that looks definitely wrong if we're expecting ASCII output because they're all more than 127 away from 0x8000, no matter how it works.

The rot8000 page says:

> It also bypasses 32 control characters, technically making it rotFFE0, sometimes with an additional offset.

I definitely don't understand how this is meant to work. Why does skipping 32 control characters turn it from rot8000 into rotFFE0? Should that say 7FE0? I still don't see how ASCII is coming out as 7Cxx.

Taking `char - 0x7c09` gets the expected ASCII output.

robinhouston · on Sept 22, 2021

If you look at the code [1], it skips:

  * control characters
  * whitespace
  * surrogate code units (U+D800 – U+DFFF)

1. https://github.com/rottytooth/rot8000/blob/main/Rottytooth.R...

NelsonMinar · on Sept 22, 2021

I like your system!

One nice property of rot13 is it reverses itself; rot13(rot13(X)) = X. At least, for basic ASCII alphabet. Your UTF-8 encoding step makes that impossible. I wonder if there's a sensible Unicode-friendly algorithm that has that rot13 property.

tiziano88 · on Sept 22, 2021

Like the one in the original post?

drdrey · on Sept 24, 2021

Instead of & 0xff, do ^ 0xff?

Tepix · on Sept 22, 2021

Cool! Works even with Emoji.

However the feature of rot13 (and rot8000) that you can use the same operation to "decrypt" it again is unfortunately missing in your variant.

amptorn · on Sept 22, 2021

This is a very bad idea because it's going to rotate ordinary characters to code points where Unicode normalization has an effect, including combining characters, whitespace, control characters... After normalization, rotating back will produce garbage.

shakna · on Sept 22, 2021

> including combining characters, whitespace, control characters...

It actually skips whitespace, control characters and surrogate pairs [0].

[0] https://github.com/rottytooth/rot8000/blob/main/Rottytooth.R...

_abox · on Sept 22, 2021

Oops I was just writing the same, I didn't realise someone had already mentioned this.

But anyway ROT in itself is a pretty stupid idea anyway, usually just done for show.

woodruffw · on Sept 22, 2021

The website explains the primary actual use case for ROT-style transforms:

> It is used to enclose the text in a sealed wrapper that the reader must choose to open - e.g. for posting things that might offend some readers, or spoilers.

AFAIK, this has been a common use of ROT13 since the 1980s. It also preserves substring search and message length (unlike BaseN encodings), which are occasionally useful properties.

_abox · on Sept 23, 2021

Ah ok, I was aware of that usage back in those days (fidonet), but I didn't realise it was still used as such. I haven't seen that function in any modern app, and most have their own spoiler tag implementation (like black on black so you only see the content after highlighting)

In cryptography circles it seems to be kind of a running joke ("just use ROT13 encryption and you'll be set!" is something I've seen several times) ;) I know it was never intended to be secure.

But it makes sense then.

Of course if you work with ROT13 a lot, you will probably gain the ability to read it just by viewing the ROT'd code, defeating its purpose :) The structure of words also gives away a lot, since it doesn't affect spaces, capitalisation or punctuation. I still don't think it's very good at this usecase either.

hermitdev · on Sept 22, 2021

Yeah, in other words: it's not intended to hold up to scrutiny, just hold up to a glance.

Phrodo_00 · on Sept 22, 2021

I don't get this line of thinking. Nowhere does it says it's supposed to have any security uses.

DonHopkins · on Sept 23, 2021

It's quintessential security theater.

dhosek · on Sept 23, 2021

Rot13 hasn't been a serious encoding scheme for security since the Roman Empire. It's all about making an easy to encode/decode wrapper for hiding spoilers and the like. Back in the olden days, newsreaders had a shortcut key to run rot13 on the text of a post so you could unveil the spoiler or puzzle solution.

wutbrodo · on Sept 22, 2021

> But anyway ROT in itself is a pretty stupid idea anyway, usually just done for show.

How do you figure? It feels like the simplest way to handle eg spoilers in a universally portable and widely-recognizable way.

Spooky23 · on Sept 22, 2021

Depends on what you’re trying to do. Might be a viable strategy for avoiding filters that are aware of thing like base64

sandwell · on Sept 22, 2021

I wonder if it is possible to generate a ROT8000 quine, that is a phrase like "hello world" which yields a semantically matching phrase in some other language?

eurasiantiger · on Sept 23, 2021

Just prompt GPT-3 for one.

kderbyma · on Sept 22, 2021

the BabbelROT

Arcorann · on Sept 23, 2021

I remember seeing this the last time it was posted on HN [0] in 2018. The About page seems to be a bit outdated since it actually skips a lot more characters than the 32 mentioned.

Running this on CJK text is an interesting exercise.

[0] https://news.ycombinator.com/item?id=18495518

BoppreH · on Sept 22, 2021

Nice idea. I often use base64 for this, since it's somewhat recognizable and there are tons of decoding tools available.

Base64 does lengthen the text by a third, which may or may not be a problem. On the other hand, it doesn't need special handling of control characters, and manages to hide word lengths well.

arethuza · on Sept 22, 2021

Many years ago I was involved in finding and fixing a messaging bug that only appeared when the base64 encoded payload had a length that was a multiple of 87 bytes (it might have been some other value - it was 15+ years ago).

Bug was in a C++ base64 encoder component.

zzzaim · on Sept 23, 2021

I'm in the midst of learning Go. So I "ported" this in Go lang as an exercise [1], including some tests and benchmark :)

[1] https://github.com/241m/rot8000

cjfd · on Sept 22, 2021

I am just getting boxes with hex codes in them if I type ascii letters so that is not so very nice. Even if you have all of the required fonts I am not sure it is that great to get characters from a completely foreign language. Also, I suppose, one could end up with surrogate code points which do not have a character representation. To summarize: I think this sounded like more fun in theory than it turns out to be in practice.

SommaRaikkonen · on Sept 22, 2021

For those who want to test it out: http://rot8000.com/Index

_abox · on Sept 22, 2021

I wonder how this will play with Unicode's highly complex combination rules.. (e.g. frowning face + brown texture = brown frowning face).

I bet using ROT on this will lead to unintended consequences because the original characters won't combine but the replaced ones will.

But anyway ROT is a dumb thing to do anyway so it doesn't have any real-world use.

kasitmp · on Sept 22, 2021

Real world use case: geocaching.com uses it to hide hints, so you don't read and spoil yourself by accident. It's pretty much accepted and adopted by the users. I also would ban words like "dumb" or, for another example "easy" in IT and CS contexts.

kyle-rb · on Sept 22, 2021

In this case, it would be very unlikely to actually happen, for a few reasons.

Almost all combining rules (including skin tone modifiers) require a zero-width joiner character between the person emoji and the modifier emoji. So really it's frowning face + ZWJ + brown texture = brown frowning face. (Although technically I don't think frowning face can be modified.) Also, there are relatively few ZWJ combinations.

Technically, there are some older combination emojis that predate ZWJ, mainly the flags, which are composed of two single-letter emojis, e.g. regional-indicator-U + regional-indicator-S = United States flag. So I guess it might be possible to get a couple of those.

And in any case, I think this page assumes that you're staying within the bounds of the basic multilingual plane (it mentions a self-inverting transform would be ROT32768), which doesn't include emojis or skin tone modifiers.

[1] https://emojipedia.org/emoji-zwj-sequence/

jfk13 · on Sept 22, 2021

> Almost all combining rules (including skin tone modifiers) require a zero-width joiner character

No, the skin tone modifiers apply directly to eligible person emojis; no ZWJ is involved. (Unless other modifiers that require ZWJ are also present, such as the gender signs.)

https://unicode.org/emoji/charts/full-emoji-modifiers.html

kapp_in_life · on Sept 22, 2021

I don't think it would matter right? The output might have less characters but inverting it would still show the original text. Like hypothetically

ab => frowning face + brown texture = brown frowning face => ab

_abox · on Sept 22, 2021

True, but some apps don't have the ability to show all these variations and may leave them out (simplify to just a face icon) when copying/pasting. Unicode interpretation is a really complex bundle of quirks these days so I'm pretty sure things will start going wrong.

contravariant · on Sept 22, 2021

If this always happens one way sure, but what if you also happened to include the symbol that gets translated to "brown frowning face"?

omershapira · on Sept 23, 2021

Daniel Temkin's work makes me laugh out loud every time. If you haven't already, look at his website. It brings so much joy. http://danieltemkin.com

ajanuary · on Sept 22, 2021

Bar bs gur avpr dhnyvgvrf bs ebg13 vf gung vf fgvyy cerfreirf fbzr fgehpgher. Nf jryy nf n pregnva nrfgurgvp nccrny, pbzzba jbeqf va pbzzhavgvrf gung hfr vg urnivyl orpbzr erpbtavfnoyr. Juvyr gung qbrf fyvtugyl qvzvavfu vg'f hfr nf n fcbvyre-grkg zrpunavfz, vg qbrf nqq gb gur phygher.

类籸籽籁簹簹簹簵籸籷籽籱籮籸籽籱籮类籱籪籷籭簵籹类籮籼籮类籿籮籼籵籮籼籼籼籽类籾籬籽籾类籮簱米籾籼籽籼籹籪籬籮籼簲簷籝籱籲籼籶籪籴籮籼籲籽籿籲籼籾籪籵籵粂籶籸类籮籭籮籷籼籮簵籪籷籭籵籮籼籼籽籪籷籽籪籵籲籼籲籷籰籪籼籪籼籹籸籲籵籮类簶籽籮粁籽籶籮籬籱籪籷籲籼籶簷籋籾籽粀籸类籴籲籷籰粀籲籽籱籷籸籷簶籵籪籽籲籷籼籬类籲籹籽籼籨籲籼籨籪籹籹籮籪籵籲籷籰簷籒籽籱籲籷籴籽籱籮类籮籲籼籪籾籼籮籬籪籼籮籯籸类籸籷籵粂类籸籽籪籽籲籷籰籬籱籪类籪籬籽籮类籼籲籷籽籱籮籵籮籽籽籮类籬籪籽籮籰籸类粂簷

dash2 · on Sept 22, 2021

Who are these nutcases? https://www.master-list2000.com/ and what is pl41nt3xt?

ditherstudies · on Sept 22, 2021

Hi, I created rot8000 for The Wrong, an online biennial of digital art -- specifically for the pl41nt3xt pavillion, which included text-only works. The pavillion was taken down when the biennial ended, and looks like that link is no longer valid

chris_st · on Sept 22, 2021

> what is pl41nt3xt?

Leet-speak for "plaintext".

genewitch · on Sept 22, 2021

1337 4 "plaintext" nub

kevinmgranger · on Sept 22, 2021

Where did you find this / how is it relevant? Was this in the OP but then removed?

DonHopkins · on Sept 23, 2021

What is the name of the non-reversible encoding scheme that translates "internationalization" to "i18n", "localization" to "l10n", "kubernetes" to "k8s", and other abbreviations like "f2k" "y1u"?

LordDragonfang · on Sept 23, 2021

According to wikipedia, a numeronym, or alternatively, numerical contraction

https://en.wikipedia.org/wiki/Numeronym

rsj_hn · on Sept 22, 2021

This is cool, but I wish more people would use a more aesthetically pleasing cipher, like morse code:

https://onlineasciitools.com/convert-ascii-to-morse

jhvkjhk · on Sept 22, 2021

By extending rot13 to Unicode characters, it supports encrypting emoji message automatically!

fer · on Sept 22, 2021

No. Emojis go from U+10000 to U+1FFFF, and this rotates chars U+0 through U+FFFF (hence the U+8000 middle point).

aidenn0 · on Sept 22, 2021

Right you'd need rot 0x88000 to cover all 17 planes. Downside would be that the space is not fully packed so you'd get a lot of invalid characters.

tragomaskhalos · on Sept 22, 2021

Pedantic point: some (early) emoji have codes below this

Gormisdomai · on Sept 22, 2021

I'm curious why it successfully rotates some emoji but not others.

E.g. stars and hearts get rotated but sunglasses do not

(EDIT: rewrote my example to use words because HN doesn't render emoji, duh)

OskarS · on Sept 22, 2021

From the article:

> While rot13 is the self-inverse for a 26-character system, and rot47 for ANSI, the Basic Multilingual Plane of Unicode requires rot32768 (or 8000 in hex) for a reciprical cypher

Not all emoji is in the BMP, at least some are in the Supplementary Multilingual Plane.

It's weird to me that if you're gonna do this dumb "rot13 but for Unicode", you'd only do it for the BMP, and not ALL of Unicode.

jsjohnst · on Sept 22, 2021

Technical answer:

Star = U+2B50 which is less than U+FFFF

Sunglasses = U+1F576 which is greater than U+FFFF

The details you might be missing is that some emoji existed in Unicode before color graphic "emoji" was actually a thing. The stars (and hearts) are examples of ones which used to be just a basic shape in the font but now are commonly full color graphical "images".

maerF0x0 · on Sept 22, 2021

This makes me wonder how many of the craigslist (or other channel) posts all in Asian language characters are actually secret messages?

genewitch · on Sept 22, 2021

There's a couple of browser plugins that do this, with a password, so long as someone else knows the password it will decode. I'm not near my machine that has it, but I know it does Korean, japanese, and Chinese characters - you choose which set you want. And it doesn't back-translate to anything useful, it's just encoding.

perl4ever · on Sept 23, 2021

I remember having some sort of encoding problem on Windows where text files would open as apparently all Chinese characters even though they weren't.

azhenley · on Sept 22, 2021

Is there any benefit to make it so that the function must be applied X times to restore the original text? E.g., ROT2000.

Tepix · on Sept 22, 2021

Yes you can use the same operation to encrypt and decrypt it if X == 2.

DonHopkins · on Sept 23, 2021

How about ROT180, that turns each character upside down, and back to the original orientation when applied twice?

stavros · on Sept 22, 2021

ˈrō-ˌtāt is pronounced ROH-tat, by the way.

roamerz · on Sept 22, 2021

I saw this and thought yay a new version of Rise of the Triad!