Advanced Data Structures (2017)

joebergeron · on May 29, 2019

Hey! I’m a coauthor with this guy :^). Incredibly fascinating (and extremely smart) dude. His interest in origami is what drew me to MIT in the first place, since I was interested in it as well from a young age. A few years ago I took his class on the computability/complexity theory of folding, and we ended up solving an open problem in the field and published a paper out of it [0]. If you had told highschool me something like that would happen, I never would have believed it :)

[0]: https://arxiv.org/abs/1901.08564

joe_the_user · on May 30, 2019

Hmm,

Scanning the topics here, what seems really interesting is how basic data structure are, naturally, containers for data but these advanced data structures are more like "mix-ins" or effects, qualities that can be added to a container. Time travel most obviously but also the various optimizing tree containers.

Not sure how to make this an interesting question. But it's interesting. How many effects can we add to a given structure? Can we make these additions automatically? Can we combine all of these or are some incompatible with each other? What are the cost-benefits between these?

mygo · on May 30, 2019

> If you had told highschool me something like that would happen, I never would have believed it :)

I find it interesting that a common story of people who did big things is that nobody would have predicted they'd end up doing the things they did.

Reminds me of my own reflections on things that I have done that I never would have expected I'd be doing. As well as this recent article by the creator of Entrepreneur magazine [0].

"literally nobody, myself included, would have once predicted that I’d be running a magazine called Entrepreneur"

[0]: https://www.entrepreneur.com/article/331221

ddxxdd · on May 30, 2019

Someone needs to apply Bayes theorem to determine if not believing in oneself is truly a prerequisite to achieving greatness.

zdragnar · on May 30, 2019

Is it "not believing in oneself" or is it "life takes unexpected turns and you don't always end up where you thought you would"?

mygo · on May 30, 2019

I don't think it's about not believing in oneself. I actually think the opposite is a requisite -- that you must believe in yourself, even if you don't know where you're heading. Or even if you know where you're heading, and it's not the path that you had originally charted. You can't see 20 moves ahead, but you know a good next move.

tuesdayrain · on May 30, 2019

I don't think people would even try something in the first place if deep down they didn't believe it could work out.

amrx431 · on May 30, 2019

Hi there. What would you suggest to a person who loves these kind of stuff (data structures and algorithms) but finds it hard to develop the underlying intuitions for coming up with these data-structure and algorithm? I personally just hammer down on deliberate practice to recognise patterns and underlying concepts to solve as many problems as I can on my own. I was wondering apart from general DS and Algo what topics in Mathematics might supplement this process?

pfdietz · on May 30, 2019

I recently had someone ask me about a problem in machine learning. The algorithm for it turned out to be something I had read about 30 years earlier, where it was being used to accelerate the simulation of the motion of stars in galaxies. I had never expected to use that algorithm, but its existence, if not details, was there in my long term memory, waiting to surface.

So, I suggest you just read everything you can, getting the gist of things so if you encounter something similar you know where to look.

I also suggest looking for little tricks you can apply again and again. For example, some of the data structures in that course use a trick where you break the structure into a top part, using one algorithm, and a bunch of leaf parts, each of size maybe O(log n), that are handled using another algorithm. Often this combination does better than either algorithm by itself.

amrx431 · on June 1, 2019

Few questions:

1. Where can I find that algorithm you are talking about?

2. The algorithm which you stated seems a bit complex from the algos taught in the undergrad level, so what are the pre-requisites for understanding it that I must be aware of?

3. What books or papers you would suggest for reading?

4. How the hell you remember something you read decades ago :)?

pfdietz · on June 1, 2019

The ML problem involved, for a set of points on a line, computing the sum (over all the pairs of distinct points) of a function d(x_i,x_j) that is, in some sense, well behaved. The purpose was to compute the similarity of two sets of real numbers.

It turns out this can be done (to sufficient accuracy) in nearly linear time using something called the fast multipole method.

http://www.umiacs.umd.edu/labs/cvl/pirl/vikas/publications/F...

I wouldn't consider the "O(log n) chunks" trick all that complex, btw, although Tarjan did have to point it out to me when I should have used it.

amrx431 · on June 14, 2019

> Tarjan did have to point it out to me when I should have used it.

Wow, you worked with Tarjan. How cool!!!.

pfdietz · on June 16, 2019

Not directly, but he did point out the improvement.

Taniwha · on May 30, 2019

Really, to quote the old "How do I get to Carnegie Hall?" it's "practice practice practice" - it's doing this stuff often enough so that the correct data structure becomes obvious is how you get good at this game

Mijka · on May 31, 2019

Recognized his face and name instantly ! I know him by reputation from his origami work and appearance in a documentary, seems to be a very cool guy.

0x445442 · on May 31, 2019

The first 17 minutes of the Time Travel lecture is the best introduction to Git I've seen and he didn't even mention Git.

420codebro · on May 29, 2019

That's amazing. Congrats!

jolmg · on May 29, 2019

Got confused by the broken link too. Those aren't separate links. It's a single link, and given the name of the file is "poster-design.pdf", I don't think the PDF is very important. It's just an image that gives examples of what is probably covered. The real content is described below as the video lectures and notes of the class[1] and the github account[2]. There are also problem sets[3], and it recommends a couple of books[4][5], both of which are apparently free.

[1] https://courses.csail.mit.edu/6.851/fall17/lectures/

[2] https://github.com/6851-2017

[3] https://courses.csail.mit.edu/6.851/fall17/psets/

[4] http://epubs.siam.org/doi/book/10.1137/1.9781611970265

[5] http://opendatastructures.org/

daveFNbuck · on May 30, 2019

It's too bad that the links are broken. When the links at the top of the page didn't work, I immediately assumed that this was for an old course and none of the material would still be available.

I only kept looking because I didn't think a useless site would make it to the top of HN. People who find the course site another way might not have a reason to keep looking.

jolmg · on May 30, 2019

What links are broken? I only know of a single link to a seemingly unimportant file that is broken.

daveFNbuck · on May 30, 2019

I thought that the single image was a series of links to the different parts of the courses. It's at the top of the page and has 9 distinct icons representing different things that will be taught. It looks a lot like it's there for navigation. When clicking on each of them didn't work, I assumed I couldn't get to the course content.

jolmg · on May 30, 2019

Yes, this is what I tried to explain in my top-level comment.

daveFNbuck · on May 31, 2019

Ah, I thought you were saying that each link was actually to the same URL. I see now what you actually meant.

jammygit · on May 29, 2019

As a busy self-learner, when is it time to learn data structures in depth? I feel like there are always 10 other technologies I need to know more urgently (eg, more bash, Linux, testing frameworks, deep learning, c++ libraries, linear algebra, common security mistakes/attacks, OpenGL, etc)

espeed · on May 30, 2019

All time. Everyday. You'll know you understand it when you can build it, when you can see its structure.

"Smart data structures and dumb code works a lot better than the other way around." -- Guy Steele

"The Representation Principle: Once a problem is described using an appropriate representation, the problem is almost solved." -- P. H. Winston

"Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won't usually need your [code]; it'll be obvious." -- Fred Brooks

"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds

"Rule 5: Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." -- Rob Pike

"What are tensors? The facts of the universe." -- Lillian Lieber

zphds · on May 29, 2019

Same here. I had the same problem especially when dealing with problems that required me to go back and (un)learn stuff from scratch.

I found that the greatest barrier to deal with seemingly hard concepts was getting the right intuition on what "might" be happening under the hood. For me, this is absolutely crucial even if my intuition about a "thing" might not be completely correct.

As someone who didn't go to a classroom to be taught a lot of the basics, I dive in head-first to read papers, I mean a lot of papers, to help build this mental model EVEN though I don't understand everything in it, but it lets you pick up patterns for solving similar problems you might be having.

For things you don't understand, you can always fit in these gaps in your mental model as you dig deeper into the abstraction cake and repeat the whole "intuition" building loop.

I find that the same also applies to Math and other domains which build on a bunch of primitives that has to be really well known to be able to reason in higher abstractions.

I feel software is no different... dig deep enough, there will be a hashmap of some sort.

knicholes · on May 29, 2019

It depends if you're learning for pleasure or for your career. ;) When I bust out my algorithms/data structures books, I find myself engrossed much like I imagine people get with a good novel. So many times I say to myself, "Whoah! That's so cool!" because of how elegantly/efficiently a problem can be solved that I had never even considered.

My favorite so far is Skiena's "The Algorithm Design Manual". I especially love the computational geometry section.

gnufied · on May 29, 2019

How does one read a algorithm or data structure book like an novel? most books require busting out keyboard or pen and paper and solving one thing or another.

Just genuinely curious because I love reading technical books but I can never finish them from page to page.

knicholes · on May 30, 2019

This book has "war stories," where he describes the problem he was faced with and his different approaches before finding his awesome solution, so that's probably what gives me that feeling. But additionally, maybe I'm a freak.

verall · on May 30, 2019

I think this is just a personal thing. kinda like some people immediately spill their brain to some cache (whiteboard, pen/paper) and others stare at a wall for minutes then write when they have a fuller thought.

If it's a technical book I probably read and enjoy it and forget most of it. If I see a similar problem it might jog my memory and then I can refer back. Only works for good books. Its a different experience than directly interacting with the examples by coding/working out yourself.

caconym_ · on May 29, 2019

If you're learning these topics for purposes of career advancement, probably there is not much reason to go much beyond the fundamentals that you might study for a FAANG-type interview. If you know that much, you can probably recognize a situation where you're missing something and go look for it.

Even that is probably overkill for a lot of people, practically speaking.

On the other hand, if you're learning for fun, you should study the thing you're most excited about because you will do the most interesting things with knowledge and skills that you're passionate about.

Ultimately, in my personal experience, the latter course has also led to career advancement.

dcolkitt · on May 30, 2019

> As a busy self-learner, when is it time to learn data structures in depth?

From a practical perspective... The best sign is if you're spending a lot of time trying to wrangle performance or scalability and most of the bottleneck is inside the standard data structures.

E.g. if you're spending too much time or memory doing dictionary lookups, then it may be worth looking into Bloom filters. But if dictionary lookups are a trivial component of your overall performance, then there are no real gains to be had from replacing with a fancier data structure.

There's a follow-up point I'd like to make too. Skill with data structures is more about knowing the right questions to ask, rather than memorizing an encyclopedic list of a whole bunch of exotic structures. When you understand the major tradeoffs and design considerations underlying data structure design, then it's usually easy to find a solution in the literature even if you were completely unaware of it prior.

Effectively that means something like being able to translate between business requirements and academic language. "Oh, I need something that works on directed graphs, handles cycles, logarithmic in space, linear in average case time, resistant to adversaries, respects cache locality during lookup, and behaves deterministically."

The best way to get learn this stuff is to read a bunch of famous papers in data structure. Again, the point isn't to memorize a zoo. It's to get comfortable with the already well-developed framework that previous researchers have used. It's very unlikely that your business problem is truly unique at a theoretical level, so most likely someone before you has already figured out the tradeoffs and considerations that you are facing.

chrisweekly · on May 30, 2019

> "Effectively that means something like being able to translate between business requirements and academic language. "Oh, I need something that works on directed graphs, handles cycles, logarithmic in space, linear in average case time, resistant to adversaries, respects cache locality during lookup, and behaves deterministically."

Excellent comment! I'd love to see a catalog of data structures that documented those characteristics; do you know of a good resource like that?

dcolkitt · on June 3, 2019

Unfortunately, I don't have any great suggestions for a single book or resource. (Doesn't mean something good isn't out there, just not anything I've come across.)

For me personally, the best resource has been simply to work through famous papers in the space. Blogs like the Morning Paper, centered around breaking down papers, are also helpful.

elevenoh · on June 1, 2019

Would also like to know if there's a resource like this out there.

snarfy · on May 30, 2019

You can have twenty years experience, but if all you learn is testing frameworks, opengl, c++ libraries, etc you never really master your profession. CS fundamentals like algorithms and data structures are much more important to learn in depth. The knowledge stays with you and is always relevant, the latest testing framework, not so much.

joebergeron · on May 29, 2019

Given the bulk of the things you've listed here, which seem to serve very practical purposes, maybe never. I wouldn't call data structures a "technology" so much as I would label them as a theoretical underpinning to much of computer science. If your goal is to learn practical technologies (which is a good goal!) you don't really _need_ to learn data structures deeply -- they're just not a skill you need in day-to-day software life (for most people). I'm a software engineer, and the idea of implementing so much as a binary heap (which is nearly one of the first data structures you learn) at work is a preposterous idea. That said, if you want to learn about data structures, go for it! They're incredibly interesting and will give you a way deeper appreciation for computer science and the work that goes into developing complex algorithms. There's just not necessarily a "pedagogical" place to stick in learning data structures with the sorts of technologies you've listed.

ascar · on May 29, 2019

>"they're just not a skill you need in day-to-day software life (for most people)."

I mostly disagree. You won't need to implement them, but a working knowledge of the basics is the basis of any performance analysis and can make a huge difference in system performance in day to day programming. At some point all the small inefficiencies also add up.

I would go so far as to say that data structures and basic algorithms (e.g. Dijkstra) are more important to become a competent engineer than any of the practical tools he listed. It's like the difference between a mechanic and a mechanical engineer.

There is also a reason they are taught very early in a computer science degree.

TrackerFF · on May 30, 2019

I think we should remember that the vast majority of computer developers and engineers will not, and probably never, work on stuff that requires deep understanding of things like data structures and algorithms, even those with advanced degrees. Most are just, how to put this, IT janitors.

To use your mechanic and mechanical engineer analogy: They're not even mechanics, they simply change the oil and light bulbs from time to time, along with cleaning the interior.

Not saying that it's not useful - you'd get a much better understanding of things...but this obsession with learning everything under the sun just to be "ready" for the job is overkill

ascar · on May 30, 2019

I guess it depends what you want to work on and what your career goals are. I think my mechanic vs mechanical engineer example is actually a pretty good analogy of actual conditions. You have (many) people learning how to use tools and fix known problems (like mechanics, or e.g. CRUD developers) versus the others that actually need to solve problems with unknown solutions or optimize existing systems (the engineers). Maybe also with known solutions, but with a far deeper theoretical knowledge as a base for doing that.

Yes, the majority of work is on solved problems and we need far more mechanics than engineers. But it's not such a clear cut and a lot of actual software engineers work on solved problems most of their time and only spend a small amount of time on the engineering part (which is probably similar to other engineering disciplines). But when you spend the small amount of time on engineering work these basics are important and I realized multiple times that I could make better decisions because of it. And I'm kind of certain that's the case for most engineers with an academic background or self-taught basics though they might not even realize it.

jlevers · on May 30, 2019

From this perspective, which people/roles would you consider to be the software engineering equivalent of a mechanic? How about the equivalent a mechanical engineer?

Swizec · on May 29, 2019

> the idea of implementing so much as a binary heap

In my experience knowing how to implement such things isn't very useful in daily practical programming. Knowing that they exist, what problems they solve, the pros and cons that is incredibly useful day to day. Especially if you can recognize when you've accidentally strayed into implementing one of them under the guise of solving a business problem. You can then apply known patterns or have the foresight to say "Yo this is a solved problem, let's find a library"

Koshkin · on May 29, 2019

Not to disagree, but even if one visits restaurants for their "day to day" meals, one might begin wondering, Hmm.. How would I cook that?

mrbill29 · on May 29, 2019

The only time this information is really helpful is if you care about optimization. If most of the data you deal with is within 1000 items, or there's no time/hardware constraint, then this isn't super helpful. But if you're trying to take a function that costs you tons of CPU time down, there may be some value in knowing how it all works under the hood, and how to improve it.

mettamage · on May 29, 2019

If I were you I'd send an email to prof. Erik Demaine about this. If you write your email thoughtfully enough, then I think he's the kind of person to answer. I would in his position send at least a quick reply.

I have the same question, except I already have a partial answer (i.e. less interested in going after it, I sometimes do these things if I'm interested enough). I'm currently working my way up 6.006, but I wonder why I'd have to do advanced data structures. My guess is: I think I don't, because it's more of a research area than anything else.

I would need to take a quick look to see if there are some interesting data structures solving some interesting problems.

Personally, I'd be interested in time travel and memory hierarchy (caching) as topics. I think they might have some immediate practical utility in those areas.

jolmg · on May 29, 2019

As someone in the same situation as you, I can advice that it's better to hold off on learning such things until you have a good moment to make use of them.

I studied a lot of Intel's 64 and IA-32 Architectures Software Developer Manuals, particularly the first and third volumes, and have forgotten a lot of details of what I read. I should have waited until I was ready to invest time in writing an OS or a first-stage bootloader or something. The broader things were useful, but I learned a lot of details that I haven't used and have forgotten. I'm sure it's still useful in the sense that I can quickly pick it back up if I ever needed to, but still. Same for all those things I read from the OSDev Wiki.

didibus · on May 30, 2019

What do we mean by in-depth? Basic knowledge is pretty fundamental. Beyond that, I'd say assume you know them, and when you're stuck because you actually don't then go learn only as much about them to get you unstuck.

po_ · on May 29, 2019

the professor is a fascinating guy:

http://erikdemaine.org/

https://en.wikipedia.org/wiki/Erik_Demaine

EDIT: so is the coauthor! http://joebergeron.io/posts/post_four.html

jbarciauskas · on May 29, 2019

He's obviously incredibly smart to have achieved what he did at such a young age, but he's actually a joy to listen to as well.

paulgb · on May 29, 2019

I was trying to remember why his name sounded familiar. 12 years ago, when I was an 18 year old with mediocre math grades trying to figure out what to do with my life, I went to a guest lecture he was doing down the street at Dalhousie University. It was the first time I remember being excited about math, which became my major.

joebergeron · on May 29, 2019

Aww, thanks :)

a-afterglow · on May 29, 2019

Just had to let you know I'm listening to Mélange and it's freaking great.

Meandering · on May 30, 2019

He is featured in this NOVA documentary. Geometry is cool.

https://www.pbs.org/video/nova-origami-revolution-episode/

elteto · on May 29, 2019

FYI: lecture notes are here https://courses.csail.mit.edu/6.851/fall17/lectures/

ehsankia · on May 30, 2019

Updated notes from 2017 are cool, though I was hoping the videos would be updated too. They are good, but I was hoping for newer higher resolution recordings.

kgwxd · on May 30, 2019

Years ago, I watched all those video preparing for a Google interview. They are really great. I was told advanced data structures would be a big part of it. Not a single thing came up in the process :/

paxys · on May 30, 2019

Tech interviews never go beyond the most basic data structures. I'm willing to bet that most of the interviewers would not be familiar with these ones either.

ehsankia · on May 30, 2019

"advanced data structures" in that context means maybe trees and linked lists. But I have to admit I did something very similar too :)

anibalfernando · on May 30, 2019

22 recorded classes

https://ocw.mit.edu/courses/electrical-engineering-and-compu...

dejawu · on May 29, 2019

I was pleasantly surprised to see that improving Wikipedia articles is an acceptable final project for this course. Is this an MIT thing?

snakeboy · on May 30, 2019

I saw that too! What a brilliant idea, given that extremely technical articles tend to have inconsistent/incomplete coverage, and could benefit greatly from this kind of initiative.

arnioxux · on May 29, 2019

I watched these lectures a few years ago and they were really good.

Since this is meant to be a graduate level research course, I wonder if any cutting edge topics have been added since? Were there any major advances or new ideas in fundamental data structures in the past few years?

brikelly · on May 30, 2019

“Grading...There are four requirements:”

And then it lists five requirements...do I get the MIT course description equivalent of a Knuth check now?

meruru · on May 30, 2019

There are two hard problems in computer science...

mothas · on May 29, 2019

I happened to interact with Erik Demaine via an open source project(KaTex). It was my first OSS contribution and he was very welcoming and friendly. Of course, I was also humbled to read about his work.

gauravphoenix · on May 29, 2019

I am getting 404 when I click on linked images.

adriansky · on May 31, 2019

me too!

neves · on May 30, 2019

The data structures look really cool, but I couldn't image example to make use of them in "everyday" software. Would someone give me some example of problems that would be easier to solve with special data structures?

tempguy9999 · on May 30, 2019

I haven't looked at the site properly yet, bear that in mind, but to answer your question I probably never have used anything 'interesting' in a couple of decades of working in industry, despite giving a lot of time learning about such things - I do find them interesting.

Nearest I might have got was trying to use a bloom filter in a vehicle tracker, but nope.

What learning about stuff generally has given me is indirect, the ability to dig down and optimise because I understand systems. As I read up on eg. sorting I understand that quicksort is very much not the quickest sort and fixed bad performance bugs from its misuse.

Or understanding Btrees so I can get very good performance out of DBs, even if I've never written a Btree and frankly never want to.

Stuff like that. Maybe one day I'll find the need for dynamic programming and learn it properly, until then it's just a waste of time.

But use them directly, no. Surely sux.

cabaalis · on May 29, 2019

At my college, this class was the primary weeding-out class in junior year. Started with a full class, maybe 50? At the end of the semester there were 7 of us.

jorblumesea · on May 29, 2019

These are some of my favorite lectures, although often goes over my head. Whenever I think I've done something clever, I watch these to ground me.

gigatexal · on May 30, 2019

Bookmarked for when I master basic data structures.

rjammala · on May 29, 2019

This works: https://courses.csail.mit.edu/6.851/fall17/lectures/

cphoover · on May 29, 2019

Links are broken :/

1-6 · on May 30, 2019

Honest question, I'm very new to this but I really want to learn. Does putting the video on repeat help with grasping the concepts?

tclancy · on May 30, 2019

I haven’t watched any of it, didn’t pay much attention in college and get freaked out when I don’t understand things the first time, so take this all with a grain of salt, but I would say try watching parts and then find a reason/ way to use the individual concepts.

1-6 · on May 30, 2019

Is the instructor using Hagoromo Fulltouch Chalk?

leovander · on May 30, 2019

Ha, I remember seeing that post recently. Looks like it goes back to a 2015 gizmodo article, and was in a recent youtube clip.

https://news.ycombinator.com/item?id=19718287

https://news.ycombinator.com/item?id=9723202

https://www.youtube.com/watch?v=PhNUjg9X4g8

Areading314 · on May 30, 2019

Having been a double major at MIT I still get anxiety spikes whenever I look at one of these course webpages.

Looks like a great course though!

prassi · on May 30, 2019

Guess this might help,

https://pastebin.com/m7q1WRcw

gumby · on May 29, 2019

Love that they recommend going to Tarjan -- nothing like drinking from the original well!

taesu · on May 29, 2019

link to pdfs 404

person_of_color · on May 30, 2019

Will this be good to study for a FAANG interview?

orn688 · on May 30, 2019

Having just spent several months preparing for FAANG interviews myself, I would say not - these go far beyond the data structures I'd expect an interview to cover at most tech companies.

person_of_color · on May 30, 2019

Thanks. What study guides did you use?