Hacker News new | past | comments | ask | show | jobs | submit login
Representing SHA-256 Hashes as Avatars (francoisbest.com)
198 points by franky47 on April 19, 2021 | hide | past | favorite | 72 comments



The problem with hash avatars in general is that people want to use them for identity verification -- and humans are wired to do so automatically -- but technologically, they cannot provide this. The space of possible avatars (2^256, in this case) is far, far larger than the number of distinct objects that humans can distinguish between. Which means that there will invariably be "collisions:" two avatars that are not identical, but appear identical to humans. As a result, if an attacker can brute-force an avatar that looks very similar to, say, Elon Musk's avatar, they can trivially scam people.

It follows that, since avatars do not provide any proof of identity, there is actually no harm in greatly truncating the hash space when generating them! That is, rather than trying to encode all 256 bits into the avatar, you can use a much more manageable number, like 16. But isn't this too small? Won't there be lots of collisions? Yes -- but that's a feature! If collisions are common, then the average user will be aware that avatar != identity, which makes them less susceptible to scamming. But 16 bits is still enough to meet the real goal of avatars: quickly distinguishing between different people in a conversation (or transaction, or whatever).

(This also shows why making avatars more costly to generate, e.g. with scrypt, can do more harm than good: doing so makes collisions less likely, but still not impossible. Meaning that if a collision does occur, whether accidental or malicious, you are less likely to notice it.)


You might get more milage in if the avatars are unique to the user viewing them rather than identical between users. If the nonce/salt used in generation it itself secure then it'd be phrohibity difficult for adversaries to force a collision without obvious detection, doubly so in communities.


"This is what it would look like if your randomart avatar conceived a child with the repo owner's randomart avatar."


That's a good idea! Although, it could still potentially backfire for the same reason as scrypt; now if an adversary is able to obtain your nonce, they are much more likely to fool you.

I guess my broader point here (which I neglected to mention in the OP) is that we already have an excellent means of verifying identities: cryptographic signatures. Avatars are fine, but our interfaces need to make it clear that an avatar is just a costume, not a fingerprint. The Hard Problem, as we all know, is tying real-world people (and objects) to virtual-world pubkeys. If we can manage that, the rest is moot.


There might not be 2^256 distinguishable objects but maybe someone can come up with 2^16 distinguishable objects and just string 16 of them together. If there is one character off in a string of 40 hexadecimal characters it is hard to notice but that would be easier to detect in a set of 16 symbols.


Memorizing order in a set of 16 similar objects is too difficult for me


I don't think you would need to memorize the order, just compare them side by side to check that they are the same.


If you have the other avatar available for comparison, you don't need to do so visually -- the computer can compare the raw bytes directly.


On a related note, I've been experimenting with using a simple word list (like the eff diceware list) to generate strings of words encoding data. Trickiest part is figuring out how to encode padding, and the eventual size of the word list, and how complicated the final solution should be (eg using word lists that are not even binary numbers and leftover bits and all that). The diceware word list is nice since the words are not ambiguous and don't have homophones.

I assumed there would be existing implementations of something similar but have not found one that fits criteria other than some that use very small word lists. Diceware has 7776 words and pushing that to 8192 should be feasible and is a bit easier to work with.


BIP-39, uses 2048 words, and can all be distinguished from each other using the first four characters of each word. This is used to encode raw binary entropy, but adapting it to arbitrary amounts of data is straightforward. For padding I would suggest either pre-encoding length at the start or using classic block cipher padding (https://en.wikipedia.org/wiki/Padding_(cryptography) )

See for BIP-39, wordlists under a folder https://github.com/bitcoin/bips/blob/master/bip-0039.mediawi...


Thanks for the reference. This is pretty much what I'd be going for!


Many years ago i created

https://github.com/luke-clifton/memorable-bits

(Also on hackage [0] but readme is missing)

It let's you define a pattern for generating "sentences" from data.

It deals with padding, lets you join word lists, or use multiple word lists in a single pattern.

Word lists can be any power of 2 long, and the library comes with a few different word lists.

[0] https://hackage.haskell.org/package/memorable-bits


There's no need to distinguish between every object at every comparison. In most applications, you'll only be comparing a few dozen avatars with each other.


In the adversarial case, yes, there is. I agree that avatars help you distinguish among ~a dozen users; what they don't do is provide strong guarantees that the person you're talking to actually is who they claim to be.


That's probably a good point. Forging an avatar that is visibly close to the target I am pretending to be is challenging-but-doable.


> The space of possible avatars (2^256, in this case) is far, far larger than the number of distinct objects that humans can distinguish between.

That sounds intriguing to me. Are you aware of any research into this?


To be honest, I have no idea how many distinct objects humans can distinguish between, but I am 99% confident that it is fewer than 2^128, much less 2^256.

I suppose it's a somewhat nuanced question, though. For example, if I were shown every avatar in sequence, I'm quite sure I would always notice the "diff" between two consecutive avatars. But the bar that I have in mind is much, much higher: given a sequence of avatars, can I recognize my friend's avatar with 100% accuracy? Given that we can't even do this within the set of <8 billion human faces (we occasionally accost a stranger as though they were a friend), I have to conclude that doing so within a set of 2^256 abstract shapes is entirely hopeless.


It is definitely a nuanced question. The definition of object is surely up for debate as well. A silly example: if one were to define an object as a string of 64 hex characters, even non-literate people could distinguish between any two distinct objects.

echo "hello" | sha256sum

>> 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03

echo "world" | sha256sum

>> e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317

But were I to briefly glance at a computer screen, I'd probably confuse these next two:

e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfdac795c9d84101eb317

e258d248fda94c63753607f7c4494ee0fcbe92f1a76bfaac795c9d84101eb317


FOLLOW-UP: It occurs to me that, actually, plenty of humans could definitely clear my "higher bar." All you need to do is memorize a 64-character hex string, which is difficult but completely doable using a memory palace or similar technique.

Practically speaking, though, this isn't something that the average person is capable of doing. Even memorizing one hex string, let alone multiple (for each of your friends), requires a lot of effort for little benefit.

Furthermore -- if you can memorize the hex directly, you don't need the avatars in the first place! :P


People memorize digits of pi. An outlier for sure, but a 7 year old memorized over 1k digits a few years ago. Pretty crazy!


Yeah, I should've been more clear. There's no doubt that humans can't distinguish/memorize 2^256 abstract shapes. However, only a small subset of those would have to be memorized - those which are relevant to the individual. I'd agree that this particular pattern doesn't have enough variability for each pattern to be unique enough to reliably identify it, but I'd conjecture that its possible to make such a pattern, which has enough variability and unique characteristics to be recognizable (ignoring the fact that an adversary could make a very similar pattern to mislead the individual - I'm not curious about it for verification.)

Your example of the fact that we can't reliably recognize every face on this planet is very interesting. Let's imagine we know n faces which I can reliably distinguish from one another, but now there is a (n+1)th face which I mix up with one of the previous ones. Now lets assume this face would instead have a very unique characteristic, unlike all previous faces - lets imagine for example, the nose on this face is upside down. Surely I'd be able to differentiate it from the previous n faces, hence the issue of identifying it might've been the limited variability/characteristics in the various previous instantiations of a face.

So there are a number of characteristics in a face, which have a certain degree of variability, which enable us to distinguish from a number of them. I've been pondering on how many of those characteristics could be combined in an object, and how high the variability could be; to create uniquely identifiable patterns. It probably depends a lot on the meaning we attribute to the pattern, different associations we have to it.

I apologize - quite the tangent, I guess. I've just been pondering a lot on this for a project I've been working on for some time.


It's a simple order of magnitude calculation. 2^256 is greater than one billion to the eighth power, times 100,000. There are probably several possible ways you could estimate how many different objects a person could distinguish between, but I think it's unlikely you'd come up with even a single billion.


> but I think it's unlikely you'd come up with even a single billion.

One billion is really really tiny though, let's just play a game :

- Unless you're colorblind, you can easily tell ten hues apart. Let's pick two colors, one with a saturated hue, and the other with a pastel one. That's 100 possibilities.

- I'm pretty sure you can easily recognize pictures of a hundred people you've met at some point in your life. Let's pick two of them, that's 10 thousand combinations.

- can you recognize ten different road signs? Ten country shapes ? Ten animals ? Ten fictional character? Ten books cover? Ten celebrities? Just pick three categories, and you've got a thousand combinations.

Now I'm pretty sure you can tell your grandma sitting under a vivid pink UK shape next to your 9th grade math teacher staring at Bruce Willis holding a giant light blue stop sign apart from any other imaginable combinations.

An untrained[1] human brain probably cannot distinguish between 2^256 items, but it's still able to do it for massive numbers.

[1]: but maybe it's possible with training: for instance, chess professionals might be able to do it.



I still like snowflakes for this: https://levien.com/snowflake-explain.html is a half-finished blog post explaining the motivation and algorithm I came up with. I never did careful user testing, but suspect that the answer would be that some people can reliably distinguish the patterns, others won't be able to.

In any case, there are a lot of variations on this "visual hash" idea, including the original fractal one, and I heard of more recent work to use the hash to seed StyleGAN face generation.


This is a great idea! Trying random ones, I couldn't find two I thought looked confusable.


Try changing just a few letters in the hash. If you change letters at the start, it changes drastically. But letters at the end change just small bits


As a warning, this would not be good for colorblind people (such as myself).

The "Hello, Hacker News!" Hash's middle ring has half it's ring that looks identical to me, and unless I looked carefully, that entire ring looked the same to me.


What would you suggest as a solution ? I considered swapping Hue for Lightness in order to increase contrast changes. Would you be interested in testing out some variants ?


Given the nature of what you're trying to do, arguably, there is very little for you to do. As you've already observed, "normal" human visual acuity is already wildly incapable of perceiving 2^256 different possibilities as distinct anyhow. If normal humans are 200 bits short of the desired 256 (and I'm still feeling generous claiming we could distinguish 2^56 different images of this type, but it's a nice round number to make my point here), color blind people are 203-ish bits short or so. It's not that materially different.

Normally when discussing being color-blind sensitive we're discussing not embedding 2 or 3 bits of information into colors that can't be distinguished by those who are color blind, but in this case, we're trying to jam massively more bits than anyone can handle into an image, so it's not clear that much is called for other than tweaking where the bits get lost a bit.

Or, to put it another way, relative to the desired goal, we're all already massively "colorblind". Those who are what we humans would call colorblind are, in relative terms, hardly at a disadvantage at all for once, because we're all so many orders of magnitude short of the mark.


The primary issue in colorblindness is:

1) One confuses two types of colors as the same one (e.g. red-green, blue-yellow, etc. colorblind) or

2) Colors that are close together appear to be the identical (Like the case that I saw, half of the row looked exactly the same to me, the the entire row looked the same until I looked close).

Perhaps a mix of shapes and colors would make it more obvious? Or constrasting the border colors too to hightlight closer difference (like if you have "f8" and "f0", which has a hamming distance of 1, you make the boarder somehow highlight the differences.

Don't get me wrong, I think it is a neat idea! I just want you to be aware.


Thanks for your feedback. Are the color-blindness simulators in Firefox devtools good enough to reproduce your experience? Or do you have tools that you'd recommend?


You're welcome! I am happy to help.

> Are the color-blindness simulators in Firefox devtools good enough to reproduce your experience?

I have honestly not used them, sorry.... (I don't make GUIs).

I am looking at the docuymentation, and I am a bit disappointed though. By far the most common issue is contrast loss like they say: https://developer.mozilla.org/en-US/docs/Tools/Accessibility...

The condition to not see one color completely is incredibly rare, and not seeing any colors even more so. By in large the issue is constrast loss.


I am also colorblind. The gold standard is to use a color palette that is engineered for color blindness, which uses a suite of color-blind-friendly colors and heavily utilizes lightness. Here's a good article on an example from Tableau: https://public.tableau.com/en-us/s/blog/2013/10/choosing-col...


I didn't know about this, thanks!


Your Hue choices are selected from a pool of 16 with various mutators applied. Hue alone isn't a viable path forward, so finding a translation of Hue to a non-Hue representation that doesn't worsen the diagram is essential.

You could apply repeating surface textures inside each slice, rather than showing a solid color, so that Hue 1 shows repeating dots, Hue 2 shows repeating lines, Hue 3 shows repeating triangles.

You could use a Braille-like 2x2 grid to represent the 4-bit Hue space as circles and lines within each slice.

If you imagine that each slice has 4 walls, replacing the missing fourth wall of the innermost slices with the innermost corner, then you could map the binary representation of 2^4 hue (such as 0110) onto "bites" out of the walls. For example, given 0110, map the 0s onto "bites" and punch a small hole into two adjacent walls of the slice; given 0000, punch a small hole into all four walls.

ASCII art of what I mean by "punch a hole into the wall", for Hue 0111 (one zero, so one hole punched). This is an uncurved slice, because ASCII art.

     ____________
    |            |
    |     __     |
    |____/  \____|


That's a great suggestion, I love the "hole punching" idea (although it'll probably end up looking like Swiss cheese).

The key is to find a solution that looks good enough when zoomed out to ~64px square, which is tricky for details, especially in the inner ring where sections are packed so close from one another.


I imagine that's why square hashes are more common than circles: the raw information density problem.


You can use less hues/shades, and make the shape change instead. That's also easier to commit to memory. If a friend's avatar is a circle with shades of green/purple, and another friend is a gem with shades of green/purple, the hue doesn't matter as much.


Strange that neither the article nor the comments mention https://gravatar.com/

It hashes the user's email http://en.gravatar.com/site/implement/hash/ and creates an "identicon" from the hash http://scott.sherrillmix.com/blog/blogger/wp_identicon/ or loads a user-defined image.


I'd recommend the open and compatible Libravatar over Gravatar

https://www.libravatar.org/


I really like the former article method over the gravatar identicon because the circular shape is not going to end up with „accidental swastikas“


I discovered robohash from Gravatar actually, but forgot to mention it, thanks for the reminder.


Suggestion: shave off two bits, and switch between the variants in the "A bit of fun" section: https://francoisbest.com/posts/2021/hashvatars#a-bit-of-fun


+1, this will create much more distinct results than subtle colour variations.


What do you mean by "shave off 2 bits"?


I think he means use 2 bits to decide the variation


Urbit also developed a solution for turning a number into an avatar, although theirs only have 32 bits of entropy, and to be honest there are many that are difficult to tell apart:

https://urbit.org/blog/creating-sigils/


these are pretty. Do you have any idea if there is a way to use the library with other data (hashes) other than the Urbit 'names' or what it is?


There is a JS sigil generator: https://github.com/urbit/sigil-js#basic-usage And a Figma plugin: https://github.com/urbit/sigil-figma-plugin

Urbit names are just another representation of a 32-bit number, like the sigils. You can use any 32-bit number as the "seed" for a sigil.


You should check out this paper where they tested different representations on humans to see what they could tell apart, and came up with a novel representation called Moji.

https://exascale.info/assets/pdf/students/MSc_Thesis_-_Micha...


One of the prettiest identicons I've seen.

Since it doesn't seem to be lossy, I was wondering if it could be somehow adapted to something that could be scanned as a QR code. I guess the minor color shifts might be hard to get right, but maybe combined/replaced with some form of symbol inside rings to help, a dot/dash combination?


I'll also leave here this very nice list of identicon implementations: https://github.com/drhus/awesome-identicons


It would be a lot more work, but it might work better if you picked something which humans are particularly tuned to notice subtle details such as faces.


Using the hash as a seed for an AI face generator like thispersondoesnotexist would be pretty powerful. Free idea for anyone who wants to give it a shot.


Look at that, you reinvented NFTs such as CryptoPunks :) https://www.larvalabs.com/cryptopunks


OpenSSH's randomart was too visually indistinctive for me so I've patched it to draw TrueColor images of cats. I wanted to actually seed a GAN to generate consistent images, but that turned out to be too much of a bother so I'm just keeping a local cache on a machine. Works nicely for that use-case as I'm able to associate a particular image with a particular location when working at a particular box. Good enough.

https://github.com/ilammy/homebrew-ssh


I did this one some time ago, allows custom visual effects using hash and seeded random: https://www.blankjs.com/


Wow, those are beautiful. Starred.


Thanks. I got tired of gravatar, wanted something customizable and to be used not only for avatars.


Despite the issue where it would be trivial to brute force similar looking but not identical 'avatars', I think this still has a few good uses for non-identification.

1. Creating at least some default avatar. Not to be used to verify identity but just somewhat better than having a very limited set of default images. Having rate limits on account creation would prevent most brute force methods. 2. Avatar suitable for partial-identification for very small populations. Imagine a matrix/Element room that as <100,000 people. The hash/math could be modified to drastically trim down the space of the hash (e.g. 2^256) to something similar to the size of the room.

#2 sounds pretty interesting. It could be expanded by making parts of the image/avatar dependent on some other input other than the user ID like the user's role in the chat group. Another segment/ring could something more short lived and relative like just identifying users in recent chat messages.


I always thought ssh randomart representations were visually unique enough; maybe combine smaller, simpler shapes with color too?

The rings are neat, but I found many to be too similar based on color alone, and segments too are really hard pick up on a pattern or something memorable


How hard would it be to instead generate faces with random facial features? Humans are already hardwired to be able to detect subtle differences between faces.

That would obviously not make it suitable for generating avatars to identify humans, but it would make this really useful to eg identify git commits or hash signatures.


Just feed the hash into the thishumandoesnotexist Neural Network? Boom, human avatar.


That would be so creepy. Imagine a random face you've never seen in your life appearing as your avatar


As a sidenote, your website breaks in Vivaldi with cookies denied and several ad-blockers. It keeps on reloading, making it impossible to close the tab or the browser. Please fix your site.


Would be more awesome if it could export as an image! Right now I'm just manually inspecting it and copying the entire <g> section.


Someone actually built exactly that while I was writing the article :)

https://github.com/wzulfikar/hashvatar


How about using the variants as well so the avatars also structurally look different from each other (and adding even more variants)?



Makes me wonder if you could effectively apply Chernoff faces (https://en.wikipedia.org/wiki/Chernoff_face) to make different hashes easier for humans to recognize. TLDR map parts of the hash to modify aspects of a face (position, size, orientation of eyes, ears etc.) and you can take advantage of all the in-built circuitry in the human brain which can identify very small differences in facial appearance.

The idea is explored a bit in Peter Watts novel Blindsight - not for hashes, but visualizing high dimensional multivariate data via clouds of tormented faces :)


Still, these images are kind of hard to compare/remember.

Why not convert a hash to a correct horse battery staple? https://xkcd.com/936/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: