Hacker News new | comments | show | ask | jobs | submit login
Show HN: Ecoji, a new base1024 emoji encoding (github.com)
99 points by kturner 5 months ago | hide | past | web | favorite | 43 comments



I made an almost identical program back in 2015 (even the same name!), though only using base64:

https://github.com/andrewhamon/encoji

Great idea, op ;)

Edit: this one definitely has had more thought out into it, specifically I like that the sort order of different input is preserved in the output.


Almost the same name!


Ah, my bad!


This seems half-jokey, but I think it'd actually be really great if places where we usually see hex or base64 hashes used emoji instead.

Much easier to compare at a glance, and have a chance of remembering (by sight -- "is this the same hash", not "let me write down the hash from nothing").


Someone has already done this[1]. It works okay (they picked 256 emoji that were very distinct) but you still have problems like mistakenly reading two emoji as being swapped.

[1]: https://github.com/emojisum/emojisum


Telegram uses emojis to verify phone calls

https://www.engadget.com/2017/03/30/telegrams-voice-calls-ar...


A more serious attempt to solve hat problem is base58: https://en.m.wikipedia.org/wiki/Base58


Plus you can actually type base58 using a phone or non-native keyboard.


Plus it's the right size for one cryptocat gene per character.


I prototyped that very idea at https://dmitri.shuralyov.com/projects/hash-emoji/.


I doubt that if you get twenty emojis thrown at you, that you'll see the difference much faster.

The easiest way to check if a hash matches is to use ==. Or === if your language has problems with numbers in strings.


My current strategy is checking the first few characters visually, hoping the odds for a collision even on those are rather low.


I would say even with emojis, people will continue doing exactly that forever.


Citymapper run buses in London that use emoji to show at which stop you have to get off.


Base64 encodes groups of 6 bits into one of 64 8-bit characters. OP is doing the same, but with each character encoding 10bits.

The only problem is that these emojis probably require two bytes each (more?), so we’re encoding 10 bits into 16 bits (60% overhead) instead of 6 bits to 8 bits (33% overhead)

It’s a nice idea though. These are clearly easier to remember and compare. Just don’t expect emojis to take less space than base64.


You would hypothetically use this for a file where your communication channel is limited to emojis and text so... Twitter?


Here's a serious attempt at encoding arbitrary data into Unicode characters:

https://github.com/qntm/base65536


Any platform that treats emojis as a single character would be a good fit.


Most emoji use 4 bytes. So it's using 32 bits to encode 10 bits. Base64 is way more efficient with 8 bits to encode 6 bits


I'd expect the encoding to be much shorter. Like, 1 emoji per 4 bytes.


There are about 2600 emoji, so it could plausibly be 1 emoji per 3 bytes. I wonder why it was implemented so much less compactly.


For one emoji to represent 3 bytes in an encoding, you'd need 224 ~ 16.7M unique emojis. With 2600 emojis, the best you could do is have each one represent 11 bits (which requires 2048 unique emojis).

Am I missing something here...?


Nope, I just didn’t think that through very well.


No.


One of the emojis in the examples, , doesn't work in Firefox 60 on Windows 10. It's the Emoji "Woozy Face" [0] which was introduced in Unicode 11.0 in 2018, so it probably has limited support for a while. There's probably better alternatives.

[0] https://emojipedia.org/face-with-uneven-eyes-and-wavy-mouth/


I also made something very similar, except it used hex-encoded hashes as input. [0]

[0] https://dmitri.shuralyov.com/projects/hash-emoji/


How about a base-N project? You configure it with your safe alphabet, it adapts to consume as many bits as it can for your 'encoding', outputs efficiency along with encoded data.



How many bytes can we dump into a tweet with this?


You want to insert a xxd -r -p in before your program in

cat encode.go | openssl dgst -binary -sha1 | ecoji


Base 64 = 256 Ascii represented in 64. Base64 < 256 Base 1024 = ?

Wouldn't "A" be "A", because it is a subset of 1024? Looks more like obfuscation code for me.


Base encodings don't always start at "A" (or even include it).

They just mean that the alphabet they use has N glyphs in it to represent the numbers 0-N. For example, base10 can be:

0123456789 or qwertyuiop or abcdefghij or ...


Base64 is groups of six bits of arbitrary binary data, being interpreted as 6 bit integers (valued 0 to 63) which index into a table of 64 printable characters. That coding then gives the binary data a printed representation that we can store in strings and plain text files.

The ASCII datum "A" doesn't turn to "A" in Base64, and in fact it doesn't turn to any specific Base64 code. In a stream of 8 bit characters where "A" occurs, that "A" byte will be split into two 6-bit groups in one of three ways: 2-6, 4-4 or 6-2. In those groups that get only 2 or 4 bits of that "A", there will likely be other non-zero bits from adjacent characters/data which help determine the code. In base 1024, we have 10 bit units that can be packed with data.

The Base64 code letter "A" is the first entry in the Base64 table table; it corresponds to zero: a sequence of six zero bits.


I think it's meant to be a joke library. There's no sensible reason to encode binary data using Unicode Emojii. I also don't see any mention of the Unicode encoding being used. Is it UTF-8, UTF-16 LE or BE?



I'm curious if there are strings whose emoji encoding still contain a readable form of the original string, but in emoji.


Our language is evolving back to hieroglyphics


For sure! I think it's pretty funny.

An advantage of emoji is that it makes it so easy to communicate a non-verbal expression. Looking at my frequently used list: there's the thinking, facepalm, sly look, and so on... they make text messaging so much more human.


I don't exactly know what the goal here is, but if the goal was to reduce the number of symbols a human needs to look at (e.g. for confirmation codes etc) then this is abysmal. 48 ascii characters reduces to 40 emoji, even though emoji occupy 2-4x the bits as ascii?


The goal is it's funny.


Waiting for NodeJS version.


Not published as an npm module, but I did make a JS version for the web to go along with the Golang version I made: https://github.com/andrewhamon/encoji/blob/master/js/index.j...


+1 from me. Thanks!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: