Hacker News new | comments | ask | show | jobs | submit login
Converting objects to hash signatures in JavaScript (kuby.ca)
23 points by ypkuby 8 days ago | hide | past | web | favorite | 17 comments





If you’re doing this in a serious setting, use a stable JSON stringification (like json-stable-stringify) that sorts object keys, and then take the SHA of the resulting string.

Better collision properties, easier to implement, and a cryptographic hash to boot, just as long as you don’t have circular references and only care about the serializable “contents” of an object.


It would be nice to have a library that had es6 data structures such as set, but supported object hash codes using a technique like this. That’s a weird thing missing in the js standard lib.

To make stuff like this work in all scenarios, you have to have to implement your comparator function in your class, but also all members in that class need to have that comparator defined as well. Then, when you call your comparator in the object you want to compare, the comparator will also call the comparators in the members, and these in their own members recursively until you get all the way to the end.

In addition, the containers need to have these same comparators, so all your data structures will need to be wrapped or redefined (arrays, hashmaps, etc).

It's much harder than it appears on the surface. You may get cases of infinite recursion in cases of mutual references. For a quick and dirty comparison, you can convert both objects to JSON and compare those strings. In fact the JSON string, if made to be deterministic, can exactly fulfill this function of this "hash" which is used to compare two objects.


It's a good practice to have a way to compare objects anyway, and it should be standardized, like __hash__ in Python or hashCode in Java.

Even a basic comparison between two nested arrays is work in JS, so we need articles like this for things we use day to day. It's weird that the orgs building the standard think that a common promise API is a priority, but a decent namespace system or a way to manipulate builtin datastructures without 3rd party lib isn't.


I would hate to have to debug a scenario where this type of comparison gives an incorrect result.

I seem to remember the early days of Java where hashCode for string only looked at the first few characters - which had amusing results when storing URLs.


Well, I'm sure it can't be worse than debugging a code using == instead of ===: at least you can look at the implementation :)

I was waiting for people to say something first. I was sure I was missing something. But for the love of God, that's horrible! Not only what ermir said, but what about collision in 32 bits? It's only 4 billion entries. I keep thinking I should do something in Node to learn, then I read stuff like this. Is this Javascript, the Good part, or the other part? I can't tell.

Additionally, if any of your class's members include the separator character, you can end up with possible collisions.

And it treats null as undefined as the empty string.

Don't blame one person's poor implementation of a feature for an entire language.

I don't. My litmus test is that I couldn't tell if that implementation was poor or not, given Javascript as a context.

So, if it were your preferred language, and someone implemented the feature in the same way, you still couldn't tell? It's not like what was done was part of the language.

Javascript has a lack of maturity compared to others, most certainly.

I tend to just use json-stable-stringify to get a consistent representation of JS objects. Then can hash or compare that.

Browser:

    const crypto = window.crypto || window.msCrypto; // IE11
    return window.crypto.subtle.digest('SHA-256', stableStringify(obj));
Node:

    require('crypto')
      .createHash('sha256')
      .update(pwd)
      .digest('base64');
[1] https://www.npmjs.com/package/json-stable-stringify

This feels like cargo culting of Java's hashCode (or similar) to me. As I understand it, hashCode is used as an optimization in the implementation of data structures like hash maps and sets, not as the primarily equality check, since collisions can occur.

It would be simpler, faster, and more correct (no chance of collisions) to just implement an isEqual method.


I'd prototyped this solution as well for deep object comparisons, storing the hash in a weakmap, so hashes could be garbage collected as the Objects themselves, which act as keys, are collected; this also solves any potential infinite recursion, as the same object key points to the same value, and you could terminate after a collision there.

That said, why are we doing this and where is our native isEqual method?


Fine but please don’t hang new methods off the Object prototype (or other built ins). We used to do that and it made a terrible mess.



Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: