

Show HN: Simple Hash Table Implementation for C - watmough
https://github.com/watmough/jwHash

======
watmough
I know this is pretty simple, but I never ever did at any point implement a
hash table, just went more or less straight to programming in Assembler in a
semi-mature system, then to MFC on Windows since about 1998.

Performance-wise, my single-threaded test runs about 6 times faster than the
equivalent JS, which puts it in the C ballpark.

I _was_ really pleased about the performance under multi-threading. With 4+
threads, it's near to being 4 times faster, and the bucket lock only spins
occasionally (prints a few '.' but still works).

------
aninteger
I'm currently using
[https://github.com/attractivechaos/klib/blob/master/khash.h](https://github.com/attractivechaos/klib/blob/master/khash.h)
in professional projects. I know it's pretty much full of macro "magic" (well
it's readable so not really..) so that makes some people scared. I really like
these header file "libraries" and am also a big fan of BSD's sys/queue.h

~~~
watmough
Yeah thanks, I'm on OS X, and stuff like timersub is coded in sys/time.h

I'll check out sys/queue.h

~~~
watmough
Thanks a bunch for that, khash looks really interesting, and it does what I
kinda wanted to do, but didn't really realize was possible.

Suddenly, the C preprocessor doesn't seem quite as far distant from C++
templates as I'd thought...

------
AntiRush
Cool! Hashtables make for a fun project, to be sure.

Also worth checking out is uthash[1], which has been around for ever. It's a
single header file include, full featured, and pretty bulletproof.

[1][https://troydhanson.github.io/uthash/](https://troydhanson.github.io/uthash/)

~~~
watmough
Ahhh, thanks for that. I will take a look.

Definitely interested to see how other people handle some of issues with
threading, hash-collisions etc., and also how to best build code that'll run
out of a single header file.

~~~
watmough
Yes, there's definitely some stuff in there I hadn't thought of like how to
handle strings as char[xxx] vs string ptrs.

Thanks.

------
pjscott
Looks fun. A couple of minor comments:

* The copystring() function looks like you re-implemented strdup() from string.h.

* The call to malloc() followed by a call to memset() to zero it can be replaced with a call to calloc(). This is faster on some platforms, and makes for cleaner code.

~~~
infradig
Because strdup() is not standard _enough_ to rely on. You have to test all
sorts of magic values to pull it in, then you have define it yourself for
those cases where it isn't defined. Why bother? Just write you own equivalent.

~~~
cperciva
If you're not running on POSIX.1-2001, you're not likely to have pthreads or
GCC builtin functions.

~~~
infradig
If you specify -std=c11 you won't get strdup().

------
necessity
I use uthash:

[https://github.com/troydhanson/uthash](https://github.com/troydhanson/uthash)

You can compare performance with it and other C hash table implementations.

------
roye
It seems like all the hash implementations I've seen have an insertion rate of
~Million/sec. I wonder if it's possible to get at least an order of magnitude
faster (single threaded). Are we close to the input string parsing at this
rate? Would sorting/caching help a lot?

------
accatyyc
Neat looking code! I've been looking for something like this. Will try it! I
find it a lot more readable than the usual "header-libs"

------
ryanmarsh
I've never implemented one but the origin of Ruby's won me over with
nostalgia.
[https://github.com/ruby/ruby/blob/trunk/include/ruby/st.h](https://github.com/ruby/ruby/blob/trunk/include/ruby/st.h)

------
AaronIG
Nice.

I'll just add, any sane implementation of make should already define $(CC) for
you.

~~~
watmough
There's some issues on my copy of OS X with getting a working gdb, which I
needed on something else I was working on ... hence the manual def.

------
SamReidHughes
It is bad that you can insert an integer and then if you try to read the value
back as a string, it'll happily interpret the integer value as a char *.

~~~
cperciva
Not really. If you don't know if you're dealing with integers or strings, you
shouldn't be writing C anyway.

~~~
SamReidHughes
[First of all, what's the deal with the anonymous upvoters bringing my
previous comment, which _is_ against the guidelines for Show HN, back into the
black? Bad comments should go to the hell where they belong!]

If you're dealing with integers or strings on a key-by-key basis and can
statically predict as a function of key whether it's an integer or a string,
you shouldn't be using a hash table, you should be using a struct or two hash
tables, one with the integers and one with the strings.

If you're dealing with integers or strings on a key-by-key basis and can't
statically predict whether it's an integer or string, you'll need some
mechanism for ascertaining that and the best way is for the hash table to
store and expose that information, because you'd be storing it elsewhere
anyway. You _could_ expose the underlying tagged union.

If you're not dealing with integers or strings on a key-by-key basis (I guess
you're dealing it on a table-by-table basis, which is a more likely usage, I
hope), you just want to avoid duplicating code. There's little harm in having
a tag and doing a run-time check anyway. Also, if you don't want to do that,
you can make one underlying implementation, and then wrap it in types "struct
hash_string_int" and "struct hash_string_string," and the like, and permit
only the right kind of methods to be called on the right structure type.

It's never the right thing to do to make needlessly risky code -- if that's
how you like to code, then _you_ shouldn't be writing C.

------
joepvd
That is an excellent documentation in the README. Even though I do not know C,
I immediately have a clue of how to use this lib.

------
perdunov
Would be nice to see a comparison with std::unordered_map<>, both on the
performance and usability side.

------
dopeboy
Very cool. Reminds me of my grad school days when I was implementing cuckoo
hashing on a cell processor.

------
tapirl
repost:
[http://www.reddit.com/r/programming/comments/3692g7/simple_h...](http://www.reddit.com/r/programming/comments/3692g7/simple_hash_table_implementation_for_c/)

