Hacker News new | past | comments | ask | show | jobs | submit login
Data Structures in JavaScript (benoitvallon.com)
147 points by shawndumas on March 9, 2016 | hide | past | favorite | 24 comments



It looks like this is a teaching exercise more than anything, but it should be noted that the implementations are not great. For instance, the set uses an array based implementation instead of hash based.

In JavaScript the hash based implementation is pretty trivial to write and some quick testing in node shows that it is dramatically faster.


Unless by "hash based" you mean just using `new Set()`, I'm not aware of a way to do this generically. Using a `{}` for this would only work for a set of strings and things that can be uniquely coerced into strings. Otherwise, looping over arrays is your only option.

Care to elaborate?


Google Closure Library does this in goog.structs.Set with an interesting generic solution.

A string key for the underlying map is generated based on the data type of the element being added in the set. If you're adding an object into the set, the object is mutated so that it stores a unique ID. The object's unique ID is used to form the string key that will identify this object in the set.

Note how goog.structs.Set.getKey_ is used in the add() method: https://github.com/google/closure-library/blob/v20160125/clo...

This is how the library obtains the unique ID of an object: https://github.com/google/closure-library/blob/v20160125/clo...


I wonder if that's a problem with stuff like immutibleJS since the object itself is permanently mutated as part of being added to the set.

Also, if this was being done for ES5+ code, then either setting that property to non-enumerable or using something like Symbols would be cool since it would hopefully have a reduced impact on other code.


Depending on the implementation, yup. Since both of the most popular Immutable libraries that I know of (Immutable.js, mori) have their own API for setting properties via a `.set` method, it would not be unreasonable for them to use `Object.freeze` in order to make raw mutations throw an error.

Both libraries provide their own implementation of a Set though, so in practice you would just use the library provided one.


> Using a `{}` for this would only work for a set of strings and things that can be uniquely coerced into strings.

The code they use has the same limitation. It adds / removes / ensures the uniqueness of elements in the set by using Array.indexOf(...), which only works for strings and things that can be uniquely coerced into strings.

So, given this limitation, using an Object-based set would be vastly more performant.


  var set = []
  var foo = {foot: 1}
  var bar = {bart: 2}
  set.push(foo)
  set.indexOf(foo) //0
  set.indexOf(bar) //-1
"indexOf() compares searchElement to elements of the Array using strict equality (the same method used by the ===, or triple-equals, operator)."[0]

MDN has the link to the ES5 specification, but the key part is 9(c)(ii):

"Let same be the result of applying the Strict Equality Comparison Algorithm to searchElement and elementK."

[0] - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


Hmm, and yet:

   var elem = {foo:"bar"}
   var set = [elem]
   set.indexOf(elem) // 0
   set.indexOf({foo:"bar"}) // -1
So I guess the strict equality comparison is using some kind of internal objectId rather than the property name and values of the objects? I guess depend on how you want to define "uniqueness" for your set...


Objects in JavaScript use referential equality when checking for strict equality, which indexOf uses. In your code, 'elem' and '{foo:"bar"}' are different objects which is why your code doesn't work as expected.

So

  var x = {foo:"bar"}
  var y = {foo:"bar"}
  var z = x
  x === y //false
  x === z //true
This is specified in the ES3 document[0] at 11.9.6(13):

"13. Return true if x and y refer to the same object or if they refer to objects joined to each other (see 13.1.2). Otherwise, return false."

[0] - http://www.ecma-international.org/publications/files/ECMA-ST...


Don't forget to mangle the strings, otherwise inserting things with names meaningful to Javascript ('hasOwnProperty', '__prototype__') can cause great suffering.


You can do it the way a lot of libraries do where you require that anything you want in a set implement an interface for hashing and equality testing. Then you use an object with hashes for keys and arrays for values.


And the Queue has an O(n) remove operation, because it just uses Array.unshift. There are plenty of implementations with O(1) remove, like http://code.stephenmorley.org/javascript/queues/


Could you share your test code?


Yeah, I thought that was strange to use a list for the set implementation too. Average set addition and search should be O(1) if using a hash table. O(n) is a problem. Glad they put that clearly at the top though.


It seems like cheating to do a Hash Table using JavaScript objects, since it's basically giving you all the hashing magic for free. It'd be cool if the example actually used an array and a hash function -- even a black box one -- to show how collisions and lookups are handled. (When I've implemented Hash Tables in JavaScript in the past, I've also included functionality to rehash the entire table into a larger array as it gets more than 25-50% full... but can't remember off the top of my head what's the mathematically optimal point to rehash.)


> When I've implemented Hash Tables in JavaScript in the past, I've also included functionality to rehash the entire table into a larger array as it gets more than 25-50% full... but can't remember off the top of my head what's the mathematically optimal point to rehash.

The load factor (entries/bucket before you resize) varies greatly depending on the hashmap's addressing and collision resolution. Robin-hood hashing allows ridiculously high open-addressing load factors for instance (above 0.9 on an open-addressing linear-probed map, an open-addressing hash table can't go beyond 1), meanwhile separate-chaining allows load factors way above 1 (buckets are sequences and conflicting entries are appended to the sequence)


I've been toying around with the idea in my head of building a small OS kernel in Javascript. I'd use grub to boot into it, ductape [0] to start an os.js or something, and then just implement everything from there.

These data-structure implementations make me feel that, even though it is crazy, it might just work. This seems to have come out very clean and easy to implement.

[0] - http://duktape.org/


You should check out runtime.js [1] if you haven't. Instead of duktape it uses v8. I'd like to see an equivalent project using spider monkey so that C/C++ code could be compiled into efficient JS. Check out this speech if you haven't already: The Birth & Death of JavaScript [2]

[1] https://github.com/runtimejs/runtime [2] https://www.destroyallsoftware.com/talks/the-birth-and-death...


Maybe implementing a virtual machine would be more suitable? You can practice a lot of the same essentials.

- Memory management can be simulated by have a fixed length array, representing your total RAM, and somehow marking index ranges as "reserved". - You can simulate some hardware devices by exposing browser capabilities. Each, you could print a character, by writing to a virtual port or register (actually implemented as an array), then later in your VM update loop, reading the values in the array, interpreting the instructions, and updating the DOM as approprite.

And all from the comfort of your browser!

You could check out some emulators that have been ported to JavaScript; these are simulating hardware like the NES, and have the capability to run operating systems.


But why?


It would be simple to get anyone into OS development. The only easier language I can find to teach someone is Python, and that doesn't have such an easily portable runtime.

JavaScript is great because you can learn it in the afternoon and master it in a few years and has a nice subset of lisp-like features.

It would be a toy to prototype in. Easy to use, and easy to modify. Nothing serious.



I just spent a few minutes looking around that and it seems like it has a lot of extra stuff piled on to make it easier for experienced developers, but that would inversely effect someone like me.

I looked around in there and have no idea how any of that works or how any of the modules relate.


For me it'd be a great way to learn about OS with something I'm already really familiar with, and also something that I can easily hack on. I.e. Similar reasons why I much more prefer stumpwn to ratpoison.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: