It can also mean "We have not changed this because we don't dare to do that, or it is too much work and we just have to live with the bad decisions made 25 years ago". And that is the last code you want to copy.
No, because then you end up reading old C-code that are IFDEF mazes and think that is good code. No, to see good code you usually have to look at what experienced people write when they get to greenfield something new.
What makes it good is the tradeoff between how well it solves the problem compared to how easy it is to maintain. And people learn how to write better code as they get more experienced, but old projects are seldom rewritten using the learnings - it is often just easier to start over from scratch.
First of all, this is what? A month or two of posts? Spreading the time to read out over that make the cost almost go away. You can do it while drinking coffee or whatever, and when reading in better formats (say, in your inbox), you will see what a mail is about and then skip it if you are not interested in this particular tangent.
But also, don't expect this kind of flame war to be a regular thing. Most discussions are a lot smaller and involve few people.
What you are missing is how the hash table behaves when it is almost full. If there is one empty spot left in the whole table, how do you find it when you insert a new entry?
I know that's probably a naive answer, I honestly don't even know how a hash table works. I know how a hash map works, at least some implementations use a linked list as a bucket. So the hash gives you the bucket, then you linear search the bucket for the element. Buckets should be small so the time to search them is negligible, giving O(1) lookup and insert performance.
Obviously this is different from what's being discussed here, this data structure doesn't even really get "full" but it's also the only implementation I know is in practical use. Not sure why one might use a hash table instead
So using buckets with linked lists like that is neither space efficient or fast. A strategy that nowadays is more common and fast is to store conflicts in the table itself using some strategy to find a place in the table itself to put the new entry. The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.
This means a linear scan that once the table gets close to being full will approach O(n). To avoid this, better strategies for choosing the next place to look is used, as well as automatic resizing of the hash table at some occupancy percentage to keep the lookup chains short. Other strategies in use will also approach O(n) but will require resizing on difference occupancy percentage. What is new in this approach is that they manage to go faster than O(n) even at almost full occupancy.
>The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.
The linear probe is by far the most efficient way to build a hashtable on any modern hardware, nothing is near close. Everything else leads to cache trashing on misses. For the nearly full table - that's a mistake - table should not go above a specific fill factor, e.g. the notorious 75% for large tables.
Yes, of course. In practice it still outperforms pretty much anything else. The lower fill factor is still cheaper (memory footprint) than having buckets and indirection.
reply