Hacker News new | past | comments | ask | show | jobs | submit login
Advanced Data Structures (2017) (csail.mit.edu)
1176 points by rjammala on May 29, 2019 | hide | past | favorite | 85 comments



Hey! I’m a coauthor with this guy :^). Incredibly fascinating (and extremely smart) dude. His interest in origami is what drew me to MIT in the first place, since I was interested in it as well from a young age. A few years ago I took his class on the computability/complexity theory of folding, and we ended up solving an open problem in the field and published a paper out of it [0]. If you had told highschool me something like that would happen, I never would have believed it :)

[0]: https://arxiv.org/abs/1901.08564


Hmm,

Scanning the topics here, what seems really interesting is how basic data structure are, naturally, containers for data but these advanced data structures are more like "mix-ins" or effects, qualities that can be added to a container. Time travel most obviously but also the various optimizing tree containers.

Not sure how to make this an interesting question. But it's interesting. How many effects can we add to a given structure? Can we make these additions automatically? Can we combine all of these or are some incompatible with each other? What are the cost-benefits between these?


> If you had told highschool me something like that would happen, I never would have believed it :)

I find it interesting that a common story of people who did big things is that nobody would have predicted they'd end up doing the things they did.

Reminds me of my own reflections on things that I have done that I never would have expected I'd be doing. As well as this recent article by the creator of Entrepreneur magazine [0].

"literally nobody, myself included, would have once predicted that I’d be running a magazine called Entrepreneur"

[0]: https://www.entrepreneur.com/article/331221


Someone needs to apply Bayes theorem to determine if not believing in oneself is truly a prerequisite to achieving greatness.


Is it "not believing in oneself" or is it "life takes unexpected turns and you don't always end up where you thought you would"?


I don't think it's about not believing in oneself. I actually think the opposite is a requisite -- that you must believe in yourself, even if you don't know where you're heading. Or even if you know where you're heading, and it's not the path that you had originally charted. You can't see 20 moves ahead, but you know a good next move.


I don't think people would even try something in the first place if deep down they didn't believe it could work out.


Hi there. What would you suggest to a person who loves these kind of stuff (data structures and algorithms) but finds it hard to develop the underlying intuitions for coming up with these data-structure and algorithm? I personally just hammer down on deliberate practice to recognise patterns and underlying concepts to solve as many problems as I can on my own. I was wondering apart from general DS and Algo what topics in Mathematics might supplement this process?


I recently had someone ask me about a problem in machine learning. The algorithm for it turned out to be something I had read about 30 years earlier, where it was being used to accelerate the simulation of the motion of stars in galaxies. I had never expected to use that algorithm, but its existence, if not details, was there in my long term memory, waiting to surface.

So, I suggest you just read everything you can, getting the gist of things so if you encounter something similar you know where to look.

I also suggest looking for little tricks you can apply again and again. For example, some of the data structures in that course use a trick where you break the structure into a top part, using one algorithm, and a bunch of leaf parts, each of size maybe O(log n), that are handled using another algorithm. Often this combination does better than either algorithm by itself.


Few questions:

1. Where can I find that algorithm you are talking about?

2. The algorithm which you stated seems a bit complex from the algos taught in the undergrad level, so what are the pre-requisites for understanding it that I must be aware of?

3. What books or papers you would suggest for reading?

4. How the hell you remember something you read decades ago :)?


The ML problem involved, for a set of points on a line, computing the sum (over all the pairs of distinct points) of a function d(x_i,x_j) that is, in some sense, well behaved. The purpose was to compute the similarity of two sets of real numbers.

It turns out this can be done (to sufficient accuracy) in nearly linear time using something called the fast multipole method.

http://www.umiacs.umd.edu/labs/cvl/pirl/vikas/publications/F...

I wouldn't consider the "O(log n) chunks" trick all that complex, btw, although Tarjan did have to point it out to me when I should have used it.


> Tarjan did have to point it out to me when I should have used it.

Wow, you worked with Tarjan. How cool!!!.


Not directly, but he did point out the improvement.


Really, to quote the old "How do I get to Carnegie Hall?" it's "practice practice practice" - it's doing this stuff often enough so that the correct data structure becomes obvious is how you get good at this game


Recognized his face and name instantly ! I know him by reputation from his origami work and appearance in a documentary, seems to be a very cool guy.


The first 17 minutes of the Time Travel lecture is the best introduction to Git I've seen and he didn't even mention Git.


That's amazing. Congrats!


Got confused by the broken link too. Those aren't separate links. It's a single link, and given the name of the file is "poster-design.pdf", I don't think the PDF is very important. It's just an image that gives examples of what is probably covered. The real content is described below as the video lectures and notes of the class[1] and the github account[2]. There are also problem sets[3], and it recommends a couple of books[4][5], both of which are apparently free.

[1] https://courses.csail.mit.edu/6.851/fall17/lectures/

[2] https://github.com/6851-2017

[3] https://courses.csail.mit.edu/6.851/fall17/psets/

[4] http://epubs.siam.org/doi/book/10.1137/1.9781611970265

[5] http://opendatastructures.org/


It's too bad that the links are broken. When the links at the top of the page didn't work, I immediately assumed that this was for an old course and none of the material would still be available.

I only kept looking because I didn't think a useless site would make it to the top of HN. People who find the course site another way might not have a reason to keep looking.


What links are broken? I only know of a single link to a seemingly unimportant file that is broken.


I thought that the single image was a series of links to the different parts of the courses. It's at the top of the page and has 9 distinct icons representing different things that will be taught. It looks a lot like it's there for navigation. When clicking on each of them didn't work, I assumed I couldn't get to the course content.


Yes, this is what I tried to explain in my top-level comment.


Ah, I thought you were saying that each link was actually to the same URL. I see now what you actually meant.


As a busy self-learner, when is it time to learn data structures in depth? I feel like there are always 10 other technologies I need to know more urgently (eg, more bash, Linux, testing frameworks, deep learning, c++ libraries, linear algebra, common security mistakes/attacks, OpenGL, etc)


All time. Everyday. You'll know you understand it when you can build it, when you can see its structure.

"Smart data structures and dumb code works a lot better than the other way around." -- Guy Steele

"The Representation Principle: Once a problem is described using an appropriate representation, the problem is almost solved." -- P. H. Winston

"Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won't usually need your [code]; it'll be obvious." -- Fred Brooks

"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds

"Rule 5: Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." -- Rob Pike

"What are tensors? The facts of the universe." -- Lillian Lieber


Same here. I had the same problem especially when dealing with problems that required me to go back and (un)learn stuff from scratch.

I found that the greatest barrier to deal with seemingly hard concepts was getting the right intuition on what "might" be happening under the hood. For me, this is absolutely crucial even if my intuition about a "thing" might not be completely correct.

As someone who didn't go to a classroom to be taught a lot of the basics, I dive in head-first to read papers, I mean a lot of papers, to help build this mental model EVEN though I don't understand everything in it, but it lets you pick up patterns for solving similar problems you might be having.

For things you don't understand, you can always fit in these gaps in your mental model as you dig deeper into the abstraction cake and repeat the whole "intuition" building loop.

I find that the same also applies to Math and other domains which build on a bunch of primitives that has to be really well known to be able to reason in higher abstractions.

I feel software is no different... dig deep enough, there will be a hashmap of some sort.


It depends if you're learning for pleasure or for your career. ;) When I bust out my algorithms/data structures books, I find myself engrossed much like I imagine people get with a good novel. So many times I say to myself, "Whoah! That's so cool!" because of how elegantly/efficiently a problem can be solved that I had never even considered.

My favorite so far is Skiena's "The Algorithm Design Manual". I especially love the computational geometry section.


How does one read a algorithm or data structure book like an novel? most books require busting out keyboard or pen and paper and solving one thing or another.

Just genuinely curious because I love reading technical books but I can never finish them from page to page.


This book has "war stories," where he describes the problem he was faced with and his different approaches before finding his awesome solution, so that's probably what gives me that feeling. But additionally, maybe I'm a freak.


I think this is just a personal thing. kinda like some people immediately spill their brain to some cache (whiteboard, pen/paper) and others stare at a wall for minutes then write when they have a fuller thought.

If it's a technical book I probably read and enjoy it and forget most of it. If I see a similar problem it might jog my memory and then I can refer back. Only works for good books. Its a different experience than directly interacting with the examples by coding/working out yourself.


If you're learning these topics for purposes of career advancement, probably there is not much reason to go much beyond the fundamentals that you might study for a FAANG-type interview. If you know that much, you can probably recognize a situation where you're missing something and go look for it.

Even that is probably overkill for a lot of people, practically speaking.

On the other hand, if you're learning for fun, you should study the thing you're most excited about because you will do the most interesting things with knowledge and skills that you're passionate about.

Ultimately, in my personal experience, the latter course has also led to career advancement.


> As a busy self-learner, when is it time to learn data structures in depth?

From a practical perspective... The best sign is if you're spending a lot of time trying to wrangle performance or scalability and most of the bottleneck is inside the standard data structures.

E.g. if you're spending too much time or memory doing dictionary lookups, then it may be worth looking into Bloom filters. But if dictionary lookups are a trivial component of your overall performance, then there are no real gains to be had from replacing with a fancier data structure.

There's a follow-up point I'd like to make too. Skill with data structures is more about knowing the right questions to ask, rather than memorizing an encyclopedic list of a whole bunch of exotic structures. When you understand the major tradeoffs and design considerations underlying data structure design, then it's usually easy to find a solution in the literature even if you were completely unaware of it prior.

Effectively that means something like being able to translate between business requirements and academic language. "Oh, I need something that works on directed graphs, handles cycles, logarithmic in space, linear in average case time, resistant to adversaries, respects cache locality during lookup, and behaves deterministically."

The best way to get learn this stuff is to read a bunch of famous papers in data structure. Again, the point isn't to memorize a zoo. It's to get comfortable with the already well-developed framework that previous researchers have used. It's very unlikely that your business problem is truly unique at a theoretical level, so most likely someone before you has already figured out the tradeoffs and considerations that you are facing.


> "Effectively that means something like being able to translate between business requirements and academic language. "Oh, I need something that works on directed graphs, handles cycles, logarithmic in space, linear in average case time, resistant to adversaries, respects cache locality during lookup, and behaves deterministically."

Excellent comment! I'd love to see a catalog of data structures that documented those characteristics; do you know of a good resource like that?


Unfortunately, I don't have any great suggestions for a single book or resource. (Doesn't mean something good isn't out there, just not anything I've come across.)

For me personally, the best resource has been simply to work through famous papers in the space. Blogs like the Morning Paper, centered around breaking down papers, are also helpful.


Would also like to know if there's a resource like this out there.


You can have twenty years experience, but if all you learn is testing frameworks, opengl, c++ libraries, etc you never really master your profession. CS fundamentals like algorithms and data structures are much more important to learn in depth. The knowledge stays with you and is always relevant, the latest testing framework, not so much.


Given the bulk of the things you've listed here, which seem to serve very practical purposes, maybe never. I wouldn't call data structures a "technology" so much as I would label them as a theoretical underpinning to much of computer science. If your goal is to learn practical technologies (which is a good goal!) you don't really _need_ to learn data structures deeply -- they're just not a skill you need in day-to-day software life (for most people). I'm a software engineer, and the idea of implementing so much as a binary heap (which is nearly one of the first data structures you learn) at work is a preposterous idea. That said, if you want to learn about data structures, go for it! They're incredibly interesting and will give you a way deeper appreciation for computer science and the work that goes into developing complex algorithms. There's just not necessarily a "pedagogical" place to stick in learning data structures with the sorts of technologies you've listed.


>"they're just not a skill you need in day-to-day software life (for most people)."

I mostly disagree. You won't need to implement them, but a working knowledge of the basics is the basis of any performance analysis and can make a huge difference in system performance in day to day programming. At some point all the small inefficiencies also add up.

I would go so far as to say that data structures and basic algorithms (e.g. Dijkstra) are more important to become a competent engineer than any of the practical tools he listed. It's like the difference between a mechanic and a mechanical engineer.

There is also a reason they are taught very early in a computer science degree.


I think we should remember that the vast majority of computer developers and engineers will not, and probably never, work on stuff that requires deep understanding of things like data structures and algorithms, even those with advanced degrees. Most are just, how to put this, IT janitors.

To use your mechanic and mechanical engineer analogy: They're not even mechanics, they simply change the oil and light bulbs from time to time, along with cleaning the interior.

Not saying that it's not useful - you'd get a much better understanding of things...but this obsession with learning everything under the sun just to be "ready" for the job is overkill


I guess it depends what you want to work on and what your career goals are. I think my mechanic vs mechanical engineer example is actually a pretty good analogy of actual conditions. You have (many) people learning how to use tools and fix known problems (like mechanics, or e.g. CRUD developers) versus the others that actually need to solve problems with unknown solutions or optimize existing systems (the engineers). Maybe also with known solutions, but with a far deeper theoretical knowledge as a base for doing that.

Yes, the majority of work is on solved problems and we need far more mechanics than engineers. But it's not such a clear cut and a lot of actual software engineers work on solved problems most of their time and only spend a small amount of time on the engineering part (which is probably similar to other engineering disciplines). But when you spend the small amount of time on engineering work these basics are important and I realized multiple times that I could make better decisions because of it. And I'm kind of certain that's the case for most engineers with an academic background or self-taught basics though they might not even realize it.


From this perspective, which people/roles would you consider to be the software engineering equivalent of a mechanic? How about the equivalent a mechanical engineer?


> the idea of implementing so much as a binary heap

In my experience knowing how to implement such things isn't very useful in daily practical programming. Knowing that they exist, what problems they solve, the pros and cons that is incredibly useful day to day. Especially if you can recognize when you've accidentally strayed into implementing one of them under the guise of solving a business problem. You can then apply known patterns or have the foresight to say "Yo this is a solved problem, let's find a library"


Not to disagree, but even if one visits restaurants for their "day to day" meals, one might begin wondering, Hmm.. How would I cook that?


The only time this information is really helpful is if you care about optimization. If most of the data you deal with is within 1000 items, or there's no time/hardware constraint, then this isn't super helpful. But if you're trying to take a function that costs you tons of CPU time down, there may be some value in knowing how it all works under the hood, and how to improve it.


If I were you I'd send an email to prof. Erik Demaine about this. If you write your email thoughtfully enough, then I think he's the kind of person to answer. I would in his position send at least a quick reply.

I have the same question, except I already have a partial answer (i.e. less interested in going after it, I sometimes do these things if I'm interested enough). I'm currently working my way up 6.006, but I wonder why I'd have to do advanced data structures. My guess is: I think I don't, because it's more of a research area than anything else.

I would need to take a quick look to see if there are some interesting data structures solving some interesting problems.

Personally, I'd be interested in time travel and memory hierarchy (caching) as topics. I think they might have some immediate practical utility in those areas.


As someone in the same situation as you, I can advice that it's better to hold off on learning such things until you have a good moment to make use of them.

I studied a lot of Intel's 64 and IA-32 Architectures Software Developer Manuals, particularly the first and third volumes, and have forgotten a lot of details of what I read. I should have waited until I was ready to invest time in writing an OS or a first-stage bootloader or something. The broader things were useful, but I learned a lot of details that I haven't used and have forgotten. I'm sure it's still useful in the sense that I can quickly pick it back up if I ever needed to, but still. Same for all those things I read from the OSDev Wiki.


What do we mean by in-depth? Basic knowledge is pretty fundamental. Beyond that, I'd say assume you know them, and when you're stuck because you actually don't then go learn only as much about them to get you unstuck.



He's obviously incredibly smart to have achieved what he did at such a young age, but he's actually a joy to listen to as well.


I was trying to remember why his name sounded familiar. 12 years ago, when I was an 18 year old with mediocre math grades trying to figure out what to do with my life, I went to a guest lecture he was doing down the street at Dalhousie University. It was the first time I remember being excited about math, which became my major.


Aww, thanks :)


Just had to let you know I'm listening to Mélange and it's freaking great.


He is featured in this NOVA documentary. Geometry is cool.

https://www.pbs.org/video/nova-origami-revolution-episode/



Updated notes from 2017 are cool, though I was hoping the videos would be updated too. They are good, but I was hoping for newer higher resolution recordings.


Years ago, I watched all those video preparing for a Google interview. They are really great. I was told advanced data structures would be a big part of it. Not a single thing came up in the process :/


Tech interviews never go beyond the most basic data structures. I'm willing to bet that most of the interviewers would not be familiar with these ones either.


"advanced data structures" in that context means maybe trees and linked lists. But I have to admit I did something very similar too :)



I was pleasantly surprised to see that improving Wikipedia articles is an acceptable final project for this course. Is this an MIT thing?


I saw that too! What a brilliant idea, given that extremely technical articles tend to have inconsistent/incomplete coverage, and could benefit greatly from this kind of initiative.


I watched these lectures a few years ago and they were really good.

Since this is meant to be a graduate level research course, I wonder if any cutting edge topics have been added since? Were there any major advances or new ideas in fundamental data structures in the past few years?


“Grading...There are four requirements:”

And then it lists five requirements...do I get the MIT course description equivalent of a Knuth check now?


There are two hard problems in computer science...


I happened to interact with Erik Demaine via an open source project(KaTex). It was my first OSS contribution and he was very welcoming and friendly. Of course, I was also humbled to read about his work.


I am getting 404 when I click on linked images.


me too!


The data structures look really cool, but I couldn't image example to make use of them in "everyday" software. Would someone give me some example of problems that would be easier to solve with special data structures?


I haven't looked at the site properly yet, bear that in mind, but to answer your question I probably never have used anything 'interesting' in a couple of decades of working in industry, despite giving a lot of time learning about such things - I do find them interesting.

Nearest I might have got was trying to use a bloom filter in a vehicle tracker, but nope.

What learning about stuff generally has given me is indirect, the ability to dig down and optimise because I understand systems. As I read up on eg. sorting I understand that quicksort is very much not the quickest sort and fixed bad performance bugs from its misuse.

Or understanding Btrees so I can get very good performance out of DBs, even if I've never written a Btree and frankly never want to.

Stuff like that. Maybe one day I'll find the need for dynamic programming and learn it properly, until then it's just a waste of time.

But use them directly, no. Surely sux.


At my college, this class was the primary weeding-out class in junior year. Started with a full class, maybe 50? At the end of the semester there were 7 of us.


These are some of my favorite lectures, although often goes over my head. Whenever I think I've done something clever, I watch these to ground me.


Bookmarked for when I master basic data structures.



Links are broken :/


Honest question, I'm very new to this but I really want to learn. Does putting the video on repeat help with grasping the concepts?


I haven’t watched any of it, didn’t pay much attention in college and get freaked out when I don’t understand things the first time, so take this all with a grain of salt, but I would say try watching parts and then find a reason/ way to use the individual concepts.


Is the instructor using Hagoromo Fulltouch Chalk?


Ha, I remember seeing that post recently. Looks like it goes back to a 2015 gizmodo article, and was in a recent youtube clip.

https://news.ycombinator.com/item?id=19718287

https://news.ycombinator.com/item?id=9723202

https://www.youtube.com/watch?v=PhNUjg9X4g8


Having been a double major at MIT I still get anxiety spikes whenever I look at one of these course webpages.

Looks like a great course though!


Guess this might help,

https://pastebin.com/m7q1WRcw


Love that they recommend going to Tarjan -- nothing like drinking from the original well!


link to pdfs 404


Will this be good to study for a FAANG interview?


Having just spent several months preparing for FAANG interviews myself, I would say not - these go far beyond the data structures I'd expect an interview to cover at most tech companies.


Thanks. What study guides did you use?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: