
Consistent Hashing Explained for High Schoolers - AkshatM
https://akshatm.svbtle.com/consistent-hash-rings-theory-and-implementation
======
Stratoscope
This is a really nice writeup! Very clear and easy to understand.

I did notice one conceptual error near the beginning:

> _Computers can pull this trick off too: they take a value, and store it at a
> location in memory. Then, given a key, they somehow use that key to figure
> out the address of that memory location, go to that location, and return
> that value._

> _Figuring out the address is called hashing, and maps that work like this
> under the hood are called hash tables._

That definition isn't right. Looking up a key to get a value isn't called
"hashing". Hashing and hash tables are just one out of a number of ways that a
program could do this.

For example, given the data objects in the first illustration that have Key
and Value properties, you could simply use a linear array of these objects.
Then you loop through the array looking for the matching Key, and there you
have your Value. Hashing is not involved here.

Of course, as the array gets longer, this simple loop gets slower and slower.
That's where you start looking for faster algorithms and data structures such
as hash tables. But the hash table isn't _fundamental_ to doing a name/value
lookup, it's just one way to do it.

Another example is a trie, very different from a hash table but could be used
for the same purpose. Tries are often used where you may want to get a list of
values that match a prefix (like an autocomplete) but they can also be used
for a full match just like a hash table or a linear array.

~~~
AkshatM
Thank you! And all true, but I wanted to be able to motivate why we generate a
hash at all for a hashtable. Disambiguating the purpose of hashing outside of
its narrow use case here fell outside the scope of my writeup, and would have
involved a needless digression - that is why I inserted a hyperlink for anyone
to follow if they wanted to learn more. :)

------
jstrieb
I am currently a high schooler near the top of my class at what I consider to
be a pretty normal public school. This explanation was quite clear for me, but
I am already familiar with the topic.

Not to detract from the write-up, but I think the claim that the explanation
is "accessible to an ordinary high-schooler" is an overly bold one.

~~~
AkshatM
That's a pity. What would you suggest could be improved?

~~~
dboreham
You'd have to improve the ordinary high schooler. A challenging task.

------
metaphorm
I thought this was an unusually good article. It covered the topic very
clearly, with relevant code samples written with excellent and idiomatic style
(and useful inline comments too), as well as covering the process of going
from theory to implementation.

This is also a great reminder of how incredibly elegant and powerful the basic
data structures (hashmaps and binary trees in this case) really are. It
doesn't take a fancy implementation to make this stuff work. Just the basics
gets it done.

~~~
AkshatM
As the author: Thank you! That was my intention. :)

------
jedberg
This is great, although I think these are some pretty advanced high schoolers!
It's cool that they are learning this stuff.

However, rendezvous hashing is superior to consistent hashing because
consistent hashing gets hotspots if you lose a node, unless you are using many
virtual nodes (which then adds a large overhead). Also, rendezvous hashing
doesn't require all the clients to know the state of all the nodes.

So maybe it would be good to update the lesson for rendezvous hashing. :)

~~~
AkshatM
Thank you! Re rendezvous hashing: I would have to understand it better first,
and write out an implementation. :) But, yes, maybe in a followup post.

~~~
jedberg
Here's a video from a talk I did about rendezvous hashing:
[https://youtu.be/x-zwxuIb1lY?t=20m38s](https://youtu.be/x-zwxuIb1lY?t=20m38s)

Might at least give you a high level overview. Then you can read this Python
which is pretty straightforward:

[https://github.com/nikhilgarg28/rendezvous/blob/master/_rend...](https://github.com/nikhilgarg28/rendezvous/blob/master/_rendezvous.py)

I think you'll like it! :)

------
Glyptodon
In a weird way I almost think it's almost easier to explain by looking at what
a Distributed Hash Table is and why might you need/want one (in general terms)
first, and then looking at algorithms rather than starting from a basic Hash
Table and just jumping to Consistent Hashing. (I think consistent hashing is
not ideally named, personally.)

That said, I think the author does a good job of doing exactly that.

~~~
AkshatM
As the author: thank you! The reason why I built it bottom-up is because I
understand novitiates learn to reason bottom-up, not yet having had
familiarity with the forest to appreciate the trees, whereas experts (those
brave woodland explorers) know the lay of the land enough to notice the
variations. That's not to say it's not suitable - just maybe not the audience
I had in mind.

------
Hydraulix989
Those are some unusually smart high schoolers.

EDIT: I am a teacher.

~~~
casion
Also a teacher, and pretty sure there's maybe 1-5 high schoolers a year that
would understand this.

~~~
dj-wonk
You say "1-5 high schoolers" based on what population? Students you've taught
in your school? How many students? What's something comparable you've tried
teaching?

I tend to think students are capable of learning many more things than people
will sometimes give them credit for. Maybe they have to do more background
preparation, but I would be reluctant to say someone couldn't learn this.

Now, when it comes to applying it in a novel way, I'd be more inclined to
think fewer students get to that point.

~~~
jstrieb
This is not a topic that would ever come up in class at my public high school.
Within my school district, there is very little emphasis on technology
education in middle and high school; I don't know of any of my peers who would
possess the background to understand this article just from reading it.
Perhaps the other commenters were referring with students possessing the
necessary context/background to understand the article.

I don't know where you went to school, but I've also come to learn that the
vast majority of high school students can be spectacularly disappointing with
regards to their motivation to learn topics like this, even if they're fully
capable of doing so.

~~~
pirocks
Imho the lack of motivation is function of the system. The vast majority of
the students are taking classes because they have to and not because they want
to. Not surprising that students aren't motivated in a class that they didn't
pick. Discaimer: I am a high school student.

------
bogomipz
This is a nice post, I wanted to point out one minor error:

>"A glossary, too, is a map - given a word, it can take you to the exact page
the word is referenced."

I think the author means index not glossary. I know its a nitpick but its such
a well-written post otherwise I thought I would mention it. Cheers.

~~~
AkshatM
This has been fixed. Thanks!

------
known
[https://en.wikipedia.org/wiki/Consistent_hashing](https://en.wikipedia.org/wiki/Consistent_hashing)

------
fjdlwlv
Why go through a long discussion of a CHT, and then end with an irrelevant
implementation of a binary tree instead of a simple list or library data
structure?

~~~
AkshatM
> "In this post, I have chosen to implement it as a binary search tree. There
> are a few reasons, most prominently because I wanted to eschew using
> anything more sophisticated than basic Python, and using a sorted list
> efficiently would mean resorting to Python’s bisect."

Other reasons:

\- Other blog posts about consistent hashing fall prey to favouring 'clever'
implementations, which is great for advanced Pythonistas, but serves to
obfuscate for anyone else. Similar arguments hold for many existing BST
implementations. \- Implementing the custom lookup I sought for this
implementation was easier if I went with something hand-rolled, and was fine
since I don't expect anyone to actually use this implementation in production
(I mean, they can, but there are better implementations out there).

Essentially, my goal was transparency and clarity. This was the best way to do
it, all things considered.

------
asafira
A little bit of a stretch to say this is for high schoolers for example. Do
most high schoolers even know how to program ?

------
TheVip
Awesome...it really helped!

~~~
AkshatM
Thank you!

