
Base65536 encoding - rahiel
https://github.com/qntm/base65536
======
ChuckMcM
256 byte packet and a 192 bit authentication hash, why use fast flux dns to
run C&C on your botnet when you can just make them twitter followers.

EDIT: And in case that isn't clear. Imagine you have a botnet, and all of the
individual members create a twitter account. All of the twitter botnet
accounts follow the 'master'. Who can tweet a command (and corresponding
authentication key) to the botnet to say "follow chuck and do up to n things
for him, here is his public key". Now Chuck suddenly has all these followers
and when the time is right he tweets out his command, "ddos my greatest enemy"
and adds his 'proof'. Off they go and blast his enemy. If he was only allotted
one command then they all un-follow him.

Basically its social media for botnets.

~~~
heinrich5991
Because Twitter would be able to shut down your botnet.

~~~
ChuckMcM
Conceptually I agree with you, Twitter should be able to shut down a botnet
like this by simply 'identifying bogus twitter accounts' (aka twitter bots)
and 'deleting those accounts en masse.'

The part where it gets weird though is that twitter already has massive
botnets which run around in it retweeting things and what not. Which they do
_not_ shut down. So is that because they don't want to? Because they can't? or
simply because it isn't worth their time? That is still an unresolved question
for me.

~~~
luminiferous
I suspect it's because the Twitter bots inflate the user count of Twitter. It
seems that even though Twitter's monetization is not the greatest, their
valuation mostly comes from the number of users on Twitter. The reasoning
process seems to be something like, "Twitter doesn't make a lot of money right
now, but they have the ear of some 300 million users, and that's going to be
valuable in the future when they get their monetization story correct, right?"
And so TWTR prices are kept afloat. Bots probably degrade the user experience
of Twitter quite a bit, but if Twitter aggressively bans all bots, their user
count goes down maybe 15%[1], and what does that do to their stock price?

[1]
[https://arxiv.org/pdf/1703.03107.pdf](https://arxiv.org/pdf/1703.03107.pdf)

~~~
dibstern
Good observations. However, user experience matters more in growing the
company (it helps user growth and retention), rather than user numbers and
short-term stock price

~~~
jacobush
Yes - but who is calling the shots?

------
pfraze
Yeah but what you _really_ want is base-emoji.
[https://github.com/pfrazee/base-emoji](https://github.com/pfrazee/base-emoji)

~~~
chocolatebunny
Sometimes you don't know what you're missing in life until you find it.

~~~
mjburgess

        Before you came into my life
        I missed you so bad
    

\- Carly Rae

------
anderskaseorg
See also:

[https://blogs.oracle.com/ksplice/the-1st-international-
longe...](https://blogs.oracle.com/ksplice/the-1st-international-longest-
tweet-contest)

Twitter characters can actually store up to nearly 31 bits each, if you’re
using the JSON API. (Or at least, this was true in 2010. I don’t know whether
this is still true.)

[http://blog.kevinalbs.com/base122](http://blog.kevinalbs.com/base122)

[https://news.ycombinator.com/item?id=13049329](https://news.ycombinator.com/item?id=13049329)

Base-122 encoding is 87.5% efficient in UTF-8, better than anything listed in
the base65536 repository’s comparison table.

~~~
livingparadox
From the conversation in the hacker news link, it looks like base122 gets the
increased efficiency from using unprintable control characters, which is
incompatible with base65536's explicit goal of only using printable non-
whitespace characters in its output.

------
matt_wulfeck
I think they missed a great opportunity to call it "base64k" encoding.

------
girst
I'm the one who made the C / UNIX Shell implementation - it was a fun and
quick thing to make.

[https://github.com/girst/base65536](https://github.com/girst/base65536)

I'd appreciate some feedback.

------
Asdfbla
I don't seem to get the efficiency table (or how efficiency is defined here?).
Since Base65536 encodes 16 bits, why can't it encode UTF-16 with 100%
efficiency? It says the efficiency is 64% instead.

I'm sure it's true, just curious why.

~~~
amptorn
Not all of the code points used are (or can be) in the Basic Multilingual
Plane. This means that when encoded in UTF-16, they come out to 32 bits, not
16. This skews the average number of output bits per input bit upwards.

~~~
jacobush
Basic Multilingual Plane always make me think of Cthulhu... it lives beoynd

------
bcoates
At first I thought this was going to be a joke, then I thought it was going to
be stupid, but it's actually brilliant.

~~~
bytecodes
I think it's all of those.

------
carry_bit
You could expand the encoding further if you didn't restrict yourself to a
whole number of bits per character.

~~~
amptorn
[https://github.com/qntm/base65537](https://github.com/qntm/base65537)

~~~
andrewflnr
> These shortcomings are expected to be fixed in Base65538.

I fricking love qntm.

------
fiatjaf
I hate this game.

Manage to make 1 point at
𤄻𣺻𣼋耈𣺻興𣼫兊𠨋𢪄𡚻𡢁𢙌𢚻𠛀𣪻栌𤄋𤯄𤆻𤆠𠞠𤪇𤆻𠙀𤅴𤆧𣪤𡚻𥪹炌𤆀㶸聙𡊰𠨌𡪻𤇅𤆀薠嫊䂔𔔌𥩋㲼耈𠊁繈倘𤨸𣾔㼬𤚱𢩋𣿋𡉌膹敃ꎹ𡩋肐𠝒𠚬醸聛㰩

[https://qntm.org/files/hatetris/hatetris.html](https://qntm.org/files/hatetris/hatetris.html)

~~~
efficax
Four points:
邇𤆻肹㾸𣾻㰈㼈𤆴僃𣊻𤆄肗𠪠𤆄㾻𠢻𤆻𤅶綻𤅋𣺻𠨰𤆄𤦴𤄫疐𠶐𤅴肹𤆰䂸㼈䂺𤄋𤅴肺𥆐䂹𤆄栌紌𤇀𤶔𣽛𤅌畜𤇂沃𠫄㲕脋𤅵𤄳𤶄𢩛𤇄昤㲺耈𠘱膀㢹𠷄絋𣝌𠥀𠘕𤪰炳𤶐腺䁋䀄𤨄𡈋ᖠ

~~~
fiatjaf
You're a genius.

------
jxy
Since when did people start to label C implementation as "Unix shell"?

~~~
stouset
Even if it's technically incorrect, I don't think it's unreasonable. When
they're saying "a Ruby implementation", "a Node implementation", etc., they
really mean an implementation callable from within those languages.

The C implementation is not a callable library, it's a binary. So in the same
sense, it's a "Unix shell -accessible implementation".

~~~
throwaway91111
I don't follow where the shell enters this conversation at all.

~~~
stouset
How exactly do you expect to invoke the C implementation? It's not a library,
it compiles to a binary.

IMHO, the environment a utility is natively accessible from is more relevant
than the language it was written in.

~~~
throwaway91111
Same way you invoke a ruby script, python acript, jvm call, whatever: with
exec. Scripts are just the easiest.

------
prophesi
So, anyone got any good HATETRIS replays? I'll edit my post if I find myself
getting a good one.
[https://qntm.org/files/hatetris/hatetris.html](https://qntm.org/files/hatetris/hatetris.html)

Edit: If you didn't look at the repo, this encoding was made to post HATERIS
replays on Twitter.

Edit: Only 3 points so far
𤆂𤆻𡚻𤆥㲺着遈𥮸㼉𤄛皲𤆻孈𤇆𡊾缎𓍌𤂻职𢪻郇膻𤅋𠅌傺𢊰䡪𤇄𤪤𡪻ꋇ𥆸𤶹膺𢡋聜𠆬𤪄膹𠬋㿄𠘬臀㾤冹𣾻𡈰𠭀䂹𤄔㼌𤚐𤢰𢢻𤇀𤞁䂺㬅𢉋𤮹㼆𣛄𡫀𤚒㡋𤢀ᖠ

~~~
crunksht
I managed 4 by stacking vertically as quickly as possible and then trying to
fill out horizontally.

ꋇ𤆻𠇇𢊻𤅷𓎼𢊻𤅵𓎬𤶸𤶸𤞑𤇋邼𣊻𤅛綌𣽋𤇅邌邬𤪀𠆃𣚻胈𡚻𤇄䂸𤁋而䂸𢙋𣻄𤄜𠢻𤇠𠪻㰄耬𢢻𣻄缨𠠋𤇁𤢱𡢻𤇂值𤞒𥆱冻ꎜ𤭺𤪰𤪲𡢻㿅𠟛𔗀邼而耬𥆑㼼翇

~~~
prophesi
Dang, I thought for sure I could get more than 4 with that tactic.
ꉌ𤆻遜𣪻𤇇ꎷ𣹋郇ꎹ𠱋𣻅𤅡𡚻𤆻𤆒𠚻絀㬌ꊂ𣺻㣅𤮻𤪺𤂻𤇄肬𤆬𥆺𥆺𤪠𤄋𤆣𠚻胁𣚻𡫁𣚻𤇀𤙛𡚻𣛋𡛋𤇂㾺𣾔弌𣽋脧𡉫灛𤶻炼点冸𣈛䁛𠝋𠟋𤋁ᖠ

~~~
crunksht
Me too! I thought I had it beat and then it through me a square:
𤆌𡊻𣾻𤇋𤆌𤆼𣾌𠳋𤇅𡊻𤅻𠆬𡊻聩𤶸𡊻𣉻𠝻𤇁𡊻𤄻炬𡊻𤀻倜𤦸䂺𠨫𤇄𠚻𤇀𡊻𤅛邬𡊻𤁛𤇅邌邬𤚁𤆄䲼𤅑𠆜邌𤪀憸𡈋𣸛𠯁𣈛㰈𤴊𤮸𤚸𣢄𤿠𡧅𢘍

~~~
lultimouomo
I got to 8 with a very similar tactic:
惃𣺻怌𢚻𤇂䂸𤀫䀌㼌䂹𢈫𤦀𤄫悸𤆔纹㴈聘𢚻𠆌𣚻𓋇𣾻𔗀𤊻ꈇ䂹𤅷𔒌膸𤅻羱繨𢘋傺𢪻𤆻𣺻𤄛𤆔㾺傺肻𤆻𤆡𤆤𤆐𤆑暌䂻𡘫𣯂𤃅𣿇𤅋𤆱𢙋傺𠆜𤢀𤆴𣺡𣫇𣢰𤾳

------
eponeponepon
Crikey, I thought this was going to be a joke project, but it isn't. Is it..?

Either way, it's a neat piece of thinking.

~~~
sqeaky
It it a solution for posting HATETRIS scores, I think it is still a joke. Just
a joke with a valid technical solution, and an above average level of
dedication.

~~~
jrochkind1
Maybe it's art? I'd sat hateris is.

------
dheera
If you really want to put binary data on Twitter, why not encode it in an
image? You could probably get several tens of kilobytes of binary data
reliably encoded in a JPEG of the maximum size Twitter allows.

~~~
shujito
Aren't JPEG's lossy? I can't think of a reliable and performant way of
retrieving data from a JPEG.

~~~
oh_sigh
On compression yes but you can craft a jpeg image which has exactly the pixel
data you need. It will fail if someone recompresses your image, but that would
be unlikely

~~~
jpttsn
Not unlikely at all, if you're going to post it on social media

------
supernintendo
Neat. I see a lot of mention of Twitter but the first thing I thought of was
packet compression. A ~50 byte packet shaves off around 20 bytes with this.
Those are good savings although I haven't looked into the encoder / decoder
enough to know if it's worth the tradeoff of having to translate every packet
on both ends. I can also see UDP datagrams being a pain in the ass to work
with when you're throwing around streams of Unicode characters.

Overall though, I like it and look forward to Base131072 being possible!

~~~
jeremyjh
I didn't see any efficiencies that exceeded 100%. Are you counting code points
rather than bytes in the unicode output size? The code points still have to be
serialized as bytes after that is done, but since Twitter limits you by the
number of characters, not the number of bytes, this amounts to a sort of
compression of data into Twitter. On the wire it would still be an expansion
of course.

~~~
piaste
For raw packets obviously not, but if:

\- you're sending big byte arrays over json/xml

\- you cannot switch to a more efficient medium

\- (but you _can_ make your remote counterpart adopt a different encoding)

\- and you _still_ need to maximize your throughput

then I guess you might consider base85 for utf-8 or base32k for utf-16?

~~~
jbg_
JSON is UTF-8 by definition, so at best you'd be using a format that looked
similar.

------
shemnon42
All those stats and one lingering question: whats the Weissman Score?

~~~
bbcbasic
What do you propose as the standard for comparison? Base64?

------
gvx
Last year, I did a similar project:
[https://github.com/gvx/base116676](https://github.com/gvx/base116676)

It had a feature where it automatically would try a couple of compression
algorithms on the text to be able to cram even more into a single tweet.

I don't think it has a practical use, but it was fun to make.

------
stcredzero
Base 32768 has a very sexy 93.75% efficiency! Maybe I should use that with my
browser game?

~~~
amptorn
Only if your browser game requires UTF-16 strings.

~~~
duskwuff
More specifically: only if you will be storing/transmitting data in UTF-16. If
you're using UTF-8, base85 is _much_ better.

~~~
sratner
Javascript strings are UTF-16 (or maybe it was UCS-2?)

~~~
duskwuff
Javascript strings _behave_ like UCS-2, but they can be stored in memory
however the interpreter likes, and they're typically written to disk or the
network as UTF-8.

------
comboy
Do not try playing this game. You're welcome.

~~~
virgil_disgr4ce
OH GOD THE PAIN THE HURTING

~~~
sbarre
The 30-rows replay is pretty impressive..

~~~
krick
I wonder if it is mathematically optimal.

~~~
artursapek
Someone spin up AlphaGo on this game

------
marcosdumay
Looks like we should create some other 20k emoji.

------
pbhjpbhj
"See a need, fill a need" (Bigweld).

------
dzuc
Enantiomorphic tetris!

------
terminado
What, no Java?

------
jheriko
a sad sign of our times... what a nonsense.

------
bitwize
What surprises me is that this encoding was developed to allow people to share
replays of an illegal, and very pathological, Tetris variant. Hackers gonna
hack.

~~~
JoshTriplett
> illegal, and very pathological

Pathological, certainly; "illegal"?

~~~
bitwize
Most aspects of Tetris's appearance and gameplay are protected by copyrights
and trademarks owned by The Tetris Company LLC. This includes trademarks on
the shape of the tetrominoes, the suffix "-tris", and the use of the Russian
folksong _Korobeiniki_ in a video game.

In their copyright-infringement case against Xio Interactive, a judge ruled
that aspects of the game such as the dimensions of the game board (which the
HATETRIS developer took pains to replicate) are protected by copyright.

Tetris is one of the most aggressively defended game IPs out there. _Any_
recognizable clone of it is potentially infringing.

~~~
JoshTriplett
There are _thousands_ of clones out there, both for-profit and otherwise, and
the vast majority have not been pursued. Trademarks on the shapes of
"tetrominoes" are questionable (you can't trademark a purely functional
element, though you could potentially trademark a particular stylization of it
or use of it in a logo). Copyright on game rules is questionable as well, and
whether it'd apply in any particular case would depend on both jurisdiction
and any potential fair use claims; there's a _long_ history of cloning games,
without using any of the original art or other assets. Pretty much the only
bit that's more clearly problematic is the use of "-tris" as a suffix; GNOME's
version was renamed from "gnometris" to "quadrapassel" for exactly that
reason.

I certainly wouldn't give such aggressive behavior any unwarranted credence by
presumptively calling a clone "illegal"; on the contrary, I cheer on creative
developments like this.

~~~
bitwize
Atari v. Phillips. Look it up.

Games which are "substantially similar" to other games infringe copyright. It
doesn't matter if the assets and code are all original, etc. This is settled
copyright law.

~~~
JoshTriplett
Already familiar with it; doesn't make it right, or justifiable. And there's
plenty of case law in both directions on reverse engineering and cloning; the
exact boundary would depend heavily on the details of a specific case.

In any case, it'd also be much harder to prove any harm caused by a variant
like this, which is designed specifically to be un-fun, as difficult as
possible, look nothing like the original, and not have any commercial interest
at all. Which makes the presumption of illegality entirely inappropriate.

