Hacker News new | past | comments | ask | show | jobs | submit login
Intro to Database Systems (Fall 2019) [video] (youtube.com)
245 points by adamnemecek on Oct 23, 2019 | hide | past | favorite | 27 comments



I took a database class in college and it mostly taught SQL syntax (which IMO can be learned from StackOverflow and such) and all the different intricacies of database normalization (3rd normal form, etc).

Having worked as a programmer for the past couple years I can't help but feel that the stuff I learned at college isn't very relevant and I wish that I knew more about the actual database internals; query execution strategies, concurrency control and stuff like that. This seems like a good place to start.


Naming them "Database" courses is a bit of a trick; they usually don't actually study databases. They study data, models of data, algorithms where accessing data is very costly and a little bit of concurrency.

This is all useful and it does make sense to think about it at the same time, but in practice:

* Concurrency requires a lot more focus than as a side effect of one course

* Cost of disk accesses isn't necessarily important in database practice, eg, if using an in-memory database.

I think database courses are fantastic and I enjoy talking about models of data, but anyone who goes into one thinking they are going to learn about actual databases has been misled. Even the SQL syntax is only really being taught as an example of how the relational data model can be insubstantiated - 'learning SQL' isn't academically interesting.

If it were me, I'd break most intro-to-databases up into 3 - one course on concurrency, one course on data models and fold the algorithms into an existing algorithms course.


Yeah, I also agree that databases are a lot more about database internals (or DBMSs).

Not sure if I would agree with splitting databases into separate courses. I think it's a lot more useful to learn about general concepts like concurrency and memory/storage management by understanding how other systems (like OSs) handle it. For concurrency, systems courses should give students a taste of the various concurrency techniques. Then the more theoretical courses like parallel computing can give a more unified and mathematical view.

I think the current state of courses already meshes pretty well:

Real Systems| Theory of Computer Systems

============|

Networks ⇘

--------

Operating Systems ⇒ Concurrency/Parallel/Distributed Computing

--------

Databases ⇗⇘

-------------- Programming Languages

Compilers ⇒⇗

My main gripe about databases courses is that they are often way too out-of-date:

* No mention of LSM trees, which are probably a bit more important than ISAM.

* Tons of time spent on 2PL, deadlocks, and strict serializability, without any mention that mainstream database systems generally default to Read Committed and use MVCC.

* ARIES - At least the course I took spent so much time on outdated cost optimization, concurrency control techniques and then just mention a single, very complicated yet important durability technique at the end. This isn't even that young of a technique anymore (1992) and there are tons of variants which are probably a bit more important to understand than all of the 2PL variants.


The previous offering of this course had students implement some of ARIES [0].

[0] https://15445.courses.cs.cmu.edu/fall2018/project4/


This looks awesome! This covers all of my complaints and even has some core material that I don't have but probably should have 'cached' in the back of my mind.

Props to Pavlo and the course TAs for the great work! A lot of DB courses don't really highlight how dynamic the storage & database systems area is right now but this course does.


This is a great book/website https://use-the-index-luke.com/


This is why I am greatful I spent my first five years as a DBA: learning SQL was just a tool but why things worked or didn’t work, why queries were fast or slow, how to best define a schema for a given use-case (that’s key — schema design should fit the business use-case not some arbitrary “beautiful” theoretical model)


The database course I took covered that a bit. We used https://www.amazon.com/Database-Management-Systems-Raghu-Ram.... It's not the most up-to-date text (although it was when I was in school), but it's probably still the best source for the basics of relational database theory and implementation.


they are not that complicated. you just need a format of storing records where you need to store size of value and value itself in byte form so you can read it back and then it is ALL about indices. and that is really it. you can try to build a db by yourself from scratch of use some key-value db as base. the magic is usualy not in storage or indexing but in the query processing and parsing and optimization. that is essentially what makes dbs different. and then, of course, how they handle transactions. in my experience a simple write lock(mutex) is the standard, though you can get a bit advance and use more graular locking so your transactions, if they do not overlap, are way faster and not stop-the-world fast.


> they are not that complicated

Yes, when you're only interested in creating a toy/proof-of-concept that doesn't consider: feature-completeness, optimization, concurrency, security, performance, transaction management, stability, etc.

I suppose you could say that _any_ complex product. "Google Search is just a simple web search box, connected to a text search engine with a little bit of linear algebra on the back end. It's really not that complicated."


Well it's a CS class not a Software Engineering class


First sentence:

“First I want to talk about how Oracle is helping us out this semester with course development.”

Closes tab


Do so at your own loss. Great lectures.


He doesn't mention IBM System R at all.



He is a great teacher. And this looks like a great course. The project, which is to create a database in C, looks most interesting to me.

I really don't have a strong background in C so I just wish I could create the project in Python since that's what I mostly use. Does anyone know a database course which uses Python to create a database?


Both languages are Turing complete so their language of choice should be transparent to anyone taking the course.


I love that he explains what's really under the hood. It's needed in a world where the trend is the opposite.


He is giving the course from the bathtub in his hotel


Last time he pretended to get pepper sprayed: https://www.youtube.com/watch?v=m72mt4VN9ik&t=540s

Edit: fake pepper spray @ 13:00


that's how you know its legit.


Yes, why not? It seems to works fine


Better than from the toilet


Andy Pavlo's video lectures are very good. Watched both introductory and advanced lectures a year or so ago. Recommended. Previous videos may be found from the channel.


Some of the best and most detailed information on database internals available on YouTube. Also great at integrating the current state of the commercial database industry with trends in academic database research.


They have a "course DJ" starting from lecture #3 ! https://youtu.be/1D81vXw2T_w?t=25


Excellent course!

I have some difficulties to follow because of missing prerequisites anyway I like this type of course.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: