Hacker News new | comments | ask | show | jobs | submit login
Algorithms and Data Structures Explained and Implemented in JavaScript (github.com)
483 points by deckermann 8 months ago | hide | past | web | favorite | 43 comments



The documentation and code quality is all good.

The implementation choices leave some things to be desired. The Queue and Stack implementations are Linked Lists instead of array backed, the hash table is closed instead of (the only barely more complicated) open Robin hood hash table scheme, the union-find/disjoint-set implementation doesn't have path compression or rank unions.

Overall very good, but it could be Great (tm) with just a little bit of work.


Wait are linked lists not better for first-in first-out data structures? I thought arrays were not preferred in those cases because shift/unshift are highly inefficient.

Am I wrong? I thought array-based queues resulted in O(n) queue/dequeue. Now for stacks, which are last-in first-out, it makes sense to use an array.


Only if you eagerly shift all elements on every operation. If you allocate extra space, and only shift every N operations (N being the max size of the list), it’s O(n/n), eg O(1) amortized. As mentioned already, you can represent it as a circular buffer to save a bit of memory. But this cost analysis also means that a simple array can be much faster than a linked list for almost all operations in theory (excludes random insert given an insert token, which is not an index). I believe there’s also experimental evidence of this showing that an array is almost always preferable to a linked list - even in the cases were a linked list is often taught as the obvious choice (such as a queue). Yep, surprised me quite a bit too.


> I thought array-based queues resulted in O(n) queue/dequeue.

Not necessarily. A circular buffer[1] can be used as a queue with O(1) queue/dequeue. C++ implementations (gcc?), IIRC, uses an interesting array-of-arrays approach; it also has O(1) queue/dequeue. I'm not sure why the array-of-arrays approach is better than a circular buffer, though.

Array based designs can result is less allocations, and maybe less overhead. For example, if you have a circular buffer with space for 16 items, it only needs to allocate space if you need more room, whereas a linked list queue would allocate for each and every item placed into it. Linked lists also require space for the pointer to the next link, for each link in the list. (And, if you keep them, back pointers, though these aren't necessary for just a queue.) Arrays might have some unused slack space, however.

[1]: https://en.wikipedia.org/wiki/Circular_buffer


Resizing a fully contiguous circle buffer would cause a copy every element as well forcing you to make a single contiguous memory section. Array of arrays just needs to resize the top level array.


But the copy only has to move n items, but was constructed with m items, where n < m (and usually n << m). Where n is the max size of the queue and m is the total number of items that will ever be enqueued.

On the other hand, an array of arrays (no recursion) doesn’t change the big-O complexity cost, just the constant multiplier. That should definitely improve performance of the uncommon operation (the copy), but hypothetically might slow down the actual queuing operations (and drastically reduce throughout)

But maybe you meant the array of arrays to be recursive? That seems like it would alter the big-O (from n+m to log(n)+m). But typically m>>n, so the net result is the same.


You use an array with pointers to the front and back of the queue, essentially a circular buffer. You might need to resize, but that happens for any array backed structure and is one (or two) memcpys.


This is something that I feel continually comes up when dealing with implementing data structures in a higher level language, and repeatedly confused me when I was learning this stuff.

I always found it misleading for example, to use an existing language implementation of a particular data structure (in this case a JS array) to represent a queue, and then simply provide methods that mock enqueue / dequeue but don't provide the same performance guarantees. All you're doing is mimicking the queue API but actually performing O(N) shift/unshift, which defeats the point of even having the data structure.

Not saying you can't do it properly with an Array, but I can't count the number of times I've seen not only blogs, but published educational materials advocate for using an existing Array-like structure as a starting point for a queue, and totally ignore the fact that that structure was not designed to remove from the front.

Just my personal opinion, but I think a linked list more clearly illustrates how to properly implement stacks/queues in a higher level language since you're forced to handle the pointers (nodes) yourself.


What book would you recommend to learn about these implementation improvements.


Really any algorithm book should cover these (excluding hash table optimizations). The Sedgewick book is often recommended as a good starter book, though I'm not sure how much of what I said is explicitly covered, it's been a while since I read it.

The array vs. linked list thing is sort of just knowledge you pick up, I suppose? Linked lists have their place, just fairly rarely. For a more in depth listing of the pros and cons, https://stackoverflow.com/questions/393556/when-to-use-a-lin... is a good discussion. I consider use cases for linked lists to be rather niche (if someone comes in and mentions the kernel - that is niche). a & b are both basically on embedded systems (real time or low memory), c is rare outside of, well, queues and stacks, which are better in arrays anyway, and d is reasonable but with a bad example (a heap-based priority queue is better - and your heap should be array backed). They're flexible and easy but rarely the best solution.

The union-find thing should be fairly standard. I mean, the optimizations are on the wiki page for union find. Pretty sure Sedgewick and online course cover those optimizations too. Adding those optimizations is equivalent from transforming a naive binary search tree into AVL or a Red-Black tree, which is a pretty huge improvement.

The hashtable stuff I learned mostly from lurking here on HN. ;)


This repo is beautiful. The README is detailed and clear, and the contents seem pretty exhaustive. As a native JS guy I'm especially excited to check out the implementation of non-tree graphs!

Having said all that: The algorithms I've had to tackle at work tend to be fairly trivial or total one-offs involving scheduling events in human time.

How much algorithmic work do you other JS folks end up doing?


> How much algorithmic work do you other JS folks end up doing?

I recently built a questionnaire/form editor/builder which relies heavily on tree structures. I ended up writing a lot of tree walking code. It was the first time I've ever needed to implement interview-like questions in an actual project.

I make heavy use of functional programming in Javascript, which lends itself to really clean implementations of algorithms like tree walking, e.g.: https://hastebin.com/qapawepavi.js


As someone new to JS I find vanilla JS tedious for FP, what library(ies) do you use to ease this?


RamdaJS is my go-to for most new projects. Other contenders in the space include lodash/fp and Elm.


I have to admit I’m far from a FP guru, very little experience with true functional languages like Haskell etc. For me it’s more a question of style and applying functional paradigms when possible, eg treating functions as first class citizens, making heavy use of .bind() and passing functions as arguments. I use ES6 via Babel and actually find it to be quite expressive and powerful when used correctly. I just try to keep my functions very small with tight separation of concerns. There are lots of libraries to make JS “more functional,” as the sibling comment alludes to, but I find vanilla JS (or at least ES6) already has many of the features necessary for functional programming, as long as you make the conscious choice to use it that way.


It requires use of Babel, but you might find the function bind operator an interesting alternative to .bind().

https://babeljs.io/docs/plugins/transform-function-bind/


That’s very cool! But I’d be hesitant to introduce something non-standard into the codebase. Sure the toolchain can handle it but I’d be worried other devs won’t immediately recognize the syntax.


Is it just me or does the hastebin site load almost instantly? On iOS. Was nice.


This is a really great repo!

> How much algorithmic work do you other JS folks end up doing?

I find JS to be the ideal language to implement an algorithm, despite its quirks. Why?

1. Because JS has syntax similar to most other class based languages, or can be used in ways that match functional languages, even if the syntax is different.

2. It is the lingua-franca of the web, so it is easy for others to also understand, use as a reference implementation (into other languages), or literally visualize it in the browser.

3. Along the lines of visualization, browser's have some of the best and state-of-the-art live debugging tools. So often it is easier to implement and test in JS (despite lack of types) than other languages.

4. For better or for worse, a pure JS implementation (no NPM or other packages) has very little standard library tooling, so you are forced to write some pretty low level logic that you might not otherwise write in real low level languages simply because they have good built-in standard libraries.

We also took the time to write some algorithm and explainer articles, but they are largely focused at the high-level conceptual layer, and explaining for kids:

- GIT explained for kids http://gun.js.org/explainers/school/class.html

- Stream sorting interactive explainer http://gun.js.org/explainers/basketball/basketball.html

- Cartoon cryptography http://gun.js.org/explainers/data/security.html

Most all the work we do is algorithmic, not app focused. But that is unusual for JS devs.


Had the same idea and got sad you got there faster. But then I looked at the code and realized you did it better than I would have done it so good job!


Me too! I wanted to refresh myself on implementations and expose the JS community to some stuff they may not have been exposed to... TypeScript is still fair game!


I like it! Something I've been thinking about recently is using ES6 proxies to visualize algorithms like this. Lay out the arrays and objects as blocks on a canvas, and highlight the blocks as the algorithm reads / writes to the corresponding slots of the arrays / objects.



https://visualgo.net/en is super cool! Thanks for posting it


Hehe—I'm working on something very similar (video): http://symbolflux.com/projects/avd —check out the colored sorting algorithm toward the end of the video.

And I'm a good bit into the javascript client which even uses proxies :) https://github.com/westoncb/JS-Watcher

(Mostly I've been using a Java client which is much more advanced, but I had an idea to do a visualized sandbox for a 'code tutor' kind of app in javascript/Electron, so I started the js client.)

Edit: I should also point out that it's meant to easily hook up to any algorithm you write, so it's as much for debugging as instructional algorithm visualization.


This is a terrific idea! The code I reviewed so far looks very well written. Thanks for this... folks looking to interview for a software developer position (especially at Google or with any group that emphasizes fundamentals) would do well to get acquainted with this repo!


I would be interested in hearing feedback on this book [1]. I bought it for a friend who's getting started with programming, coming from a different field. It seems a good book overall, but I haven't had the time to take a very deep look at it.

[1] https://www.packtpub.com/web-development/learning-javascript...


Tangential, but is there a name for a Trie that isn't for string searching/matching? I've had a need for this in the past but each node needed to store a non-character value. This was to facilitate a sort-of path finding where the input was matched against a known set of paths.

Every library I looked at had implementations revolving around nodes that stored chars (you know, for auto-complete scenarios) but nowhere in the definition of a Trie is it bound to such an implementation (so I thought).


Here's mine: https://github.com/superlopuh/SuperTrie

I've implemented at least 5 different generic Tries since this one. The idea is almost always the same, but different tradeoffs can require different implementations. While I don't recommend that you use this version, the contents of [1] should give you all the inspiration you need to roll your own.

[1] https://github.com/superlopuh/SuperTrie/blob/master/SuperTri...


I think that is just the common use case.

It shouldn’t be too hard to roll your own here, it’s just a set of tree traversal functions.


Oh I did, was just surprised every formal language had one tied to string searching/matching, including this demo.


I'd love to see something like this for every language... similar to the to do app which has become the de facto standard to compare web frameworks..


If anybody wants to build it into a library I threw together an index file and included the rollup run to allow you to import everything as one library or as separate modules:

https://gist.github.com/robertfairley/11ba23640578650671dbde...


Along similar lines, does anyone know of something similar that includes multigraphs? (Graphs with parallel edges).

There's a python library called networkx, but it is so heavily object-oriented (multi-inheritance) as to make it impossible to get any benefit from reading the code.


Damn I’d just started doing something similar on the side. Looks like I’ll have to rethink my efforts. :)

Nicely done


Seems like a good reference manual to use before attending an interview at Google/Amazon/Facebook etc


They don't explain lock-free data-structures?


Bookmarked! Well done.


is there a similar thing with python too ?


There's this that appeared on HN a year ago. https://github.com/OmkarPathak/pygorithm



I have mixed feelings about that site.

There are people posting verbatim interview questions, which is a breach of interview NDA.


Gold!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: