
Python Hash Tables: understanding dictionaries - mastro35
http://thepythoncorner.com/dev/hash-tables-understanding-dictionaries/
======
smabie
I suppose this isn't a bad introduction to hash tables, but I would assume
that most HN readers know how to implement a hash table. It seems like talking
about fundamental CS concepts in.. _Python_ is a perennial source of quick and
easy blog posts for a lot of people.

~~~
labster
Anecdata: I don’t know how to implement a hash table, but I know how to use
them. Not everyone here has a CS education background.

~~~
tmh88j
If you're interested in implementing a hash table I would assume you're also
generally interested in data structures and algorithms. I'd highly suggest the
Algorithms, Part 1 class on Coursera. It's totally free and offered through
Princeton. It covers hash tables in the last segment, albeit implemented in
Java, but still very thorough and is a great introduction to common data
structures/algorithms and analysis, like linked lists, union find, trees,
quick sort and time/space complexity.
[https://www.coursera.org/learn/algorithms-
part1](https://www.coursera.org/learn/algorithms-part1)

~~~
prad9104
What level of math does one need to understand to take this course?

~~~
smabie
I've ignored math prereqs my entire life and turned out fine. If you're
genuinely interested, it's not too hard to learn the math as needed. It might
actually be a better way to learn, because you're learning is always in
service of other learning.

I haven't taken that Coursera course, but in college, I believe only calc I
and II are the prereqs for intro to data structures and algorithms. Though I
know solid engineers who understand applied CS who don't have a good grasp of
math at all.

~~~
labster
> It might actually be a better way to learn, because you're learning is
> always in service of other learning.

Me in freshman calculus: Why are we learning these Taylor series? It's so
boring, and I'll never need to use it

Me three years later in upper-division meteorology: It turns out 90% of what
we do is numerical methods because fluid dynamics is hard with limited data.

It's definitely easier for me to learn things when I have an application for
it.

------
formerly_proven
Raymond Hettinger, Modern Python Dictionaries A confluence of a dozen great
ideas:
[https://www.youtube.com/watch?v=npw4s1QTmPg](https://www.youtube.com/watch?v=npw4s1QTmPg)

------
l0b0
FYI, the formatting of several of the Python code blocks is messed up,
starting with lines like this:

```python linenums=”21” def get_value(self, input_key):

Also, might be worth noting that the "The Pythonic Implementation of Python
Hash Tables" section seems to cover either Python 3.6 or 3.7, and that the
implementation seems to have changed a bit in 3.8: now `sys.getsizeof({}) ==
64` and `sys.getsizeof({"a": 100}) == 232`. Does anyone have the low-down on
how it's changed and possibly how it's likely to change again in future
versions?

~~~
mastro35
I tried to fix it, let me know if it’s broken again... It’s not super easy to
check by myself because I'm not at home and I'm working with my phone... :D

------
fernly
Formatting of several of the code segments is borked.

It should be mentioned that a Python set type is basically a naked hash table,
or a dictionary with all keys and no data.

As noted, in 3.6 dicts are ordered, but in 3.8 they can be reversed(); and in
3.9 the new merge and update operators are added:

[https://docs.python.org/3.9/whatsnew/3.9.html#dictionary-
mer...](https://docs.python.org/3.9/whatsnew/3.9.html#dictionary-merge-update-
operators)

~~~
masklinn
> It should be mentioned that a Python set type is basically a naked hash
> table, or a dictionary with all keys and no data.

Python's set and dict are actually completely different codebases. Notably,
the naturally ordered hash table of 3.6 dicts is _not_ used by or for sets.

~~~
viraptor
In case anyone is curious, sets were dicts in the original implementation:
[https://github.com/python/cpython/blob/a690a9967e715663b7a42...](https://github.com/python/cpython/blob/a690a9967e715663b7a421c9ebdad91381cdf1e4/Objects/setobject.c)

It was changed in 2005 for python 2.5
[https://github.com/python/cpython/commit/9f1a6796eb83a2884df...](https://github.com/python/cpython/commit/9f1a6796eb83a2884df5fd93487634e46d8830a7)

------
asdflke
So not the best article and advocates for md5? Also full of ads. Why is this
getting traction?

~~~
nightfly
This is not advocating for md5. :/

~~~
danpalmer
This does introduce hash functions, list MD5 and other cryptographic hash
functions as examples, and then go on to explain how hash functions are used
in hash tables.

Not advocating for MD5, but it does strongly imply the use of cryptographic
hash functions for hash tables, which is almost never the case, and a useful
distinction to explain.

Cryptographic hash functions are trying to make the original data
unrecoverable in any way, and essentially distributing as evenly throughout
their output space as possible. They are also often trying to be slow to
compute to prevent brute forcing.

On the other hand, hashing for a hash table doesn’t need to be secure in the
same way (hence Python’s smaller bits just returning themselves as their
hashed values). They also don’t need to distribute themselves evenly. They
need to be fast, unlike a cryptographic hash, and they need to restrict the
data to a fixed size.

~~~
mastro35
Yes, maybe I haven't been clear, I tried just to explain what an hash is
before introducing hash tables, if someone understood that cryptographic hash
functions have to be used for hash table it's probably because my explanation
wasn't clear enough... :(

Consider that I'm not a native speaker, so sometimes it's hard for me to send
the exact message I would like to send... I will try to explain it better as
soon as I will get anywhere I can use a computer :)

Thanks!

------
mastro35
I'm really sorry for the ads guys, but this is a small site that lives thanks
to them... I wish I could avoid them and live just with donations but it isn't
possible yet.

Moreover, I don't know anything about SEO or stuff like that so the ads are
configured automatically (I went to the AdSense dashboard and clicked on the
”do whatever you want” button!!! :D )

However, I will try to limit the ads in the next few days understanding with
the AdSense reports which one are not bringing earnings and can be removed.

Thanks for the feedbacks!

~~~
solarkraft
On mobile the ads are a bit too close to be completely comfortable, but
they're definitely not overly annoying either.

The article is a nice introduction, the rest of the website looks great and
you don't seem to have included any of the modern stupid annoyances that will
make me hate you (in-page popups, notification request, app install banners,
chat bubbles).

Thanks for the article!

------
pansa2
Ugh. Full of ads and full of inaccuracies.

To learn about Python’s dictionaries, I’d recommend watching the talks by
Brandon Rhodes and Raymond Hettinger.

~~~
osn9363739
Any talks you recommend in particular?

~~~
pansa2
The talks by Brandon Rhodes are these two:

* The Mighty Dictionary (PyCon 2010): [https://www.youtube.com/watch?v=oMyy4Sm0uBs](https://www.youtube.com/watch?v=oMyy4Sm0uBs)

* The Dictionary Even Mightier (PyCon 2017): [https://www.youtube.com/watch?v=66P5FMkWoVU](https://www.youtube.com/watch?v=66P5FMkWoVU)

There's also one by Raymond Hettinger, which seems to have been given first at
a meetup and then, in shorter form, at PyCon 2017:

* Modern Dictionaries: [https://www.youtube.com/watch?v=p33CVV29OG8](https://www.youtube.com/watch?v=p33CVV29OG8)

* Modern Python Dictionaries - A confluence of a dozen great ideas (PyCon 2017): [https://www.youtube.com/watch?v=npw4s1QTmPg](https://www.youtube.com/watch?v=npw4s1QTmPg)

