I took a database class in college and it mostly taught SQL syntax (which IMO can be learned from StackOverflow and such) and all the different intricacies of database normalization (3rd normal form, etc).
Having worked as a programmer for the past couple years I can't help but feel that the stuff I learned at college isn't very relevant and I wish that I knew more about the actual database internals; query execution strategies, concurrency control and stuff like that. This seems like a good place to start.
Naming them "Database" courses is a bit of a trick; they usually don't actually study databases. They study data, models of data, algorithms where accessing data is very costly and a little bit of concurrency.
This is all useful and it does make sense to think about it at the same time, but in practice:
* Concurrency requires a lot more focus than as a side effect of one course
* Cost of disk accesses isn't necessarily important in database practice, eg, if using an in-memory database.
I think database courses are fantastic and I enjoy talking about models of data, but anyone who goes into one thinking they are going to learn about actual databases has been misled. Even the SQL syntax is only really being taught as an example of how the relational data model can be insubstantiated - 'learning SQL' isn't academically interesting.
If it were me, I'd break most intro-to-databases up into 3 - one course on concurrency, one course on data models and fold the algorithms into an existing algorithms course.
Yeah, I also agree that databases are a lot more about database internals (or DBMSs).
Not sure if I would agree with splitting databases into separate courses. I think it's a lot more useful to learn about general concepts like concurrency and memory/storage management by understanding how other systems (like OSs) handle it. For concurrency, systems courses should give students a taste of the various concurrency techniques. Then the more theoretical courses like parallel computing can give a more unified and mathematical view.
I think the current state of courses already meshes pretty well:
Real Systems| Theory of Computer Systems
============|
Networks ⇘
--------
Operating Systems ⇒ Concurrency/Parallel/Distributed Computing
--------
Databases ⇗⇘
-------------- Programming Languages
Compilers ⇒⇗
My main gripe about databases courses is that they are often way too out-of-date:
* No mention of LSM trees, which are probably a bit more important than ISAM.
* Tons of time spent on 2PL, deadlocks, and strict serializability, without any mention that mainstream database systems generally default to Read Committed and use MVCC.
* ARIES - At least the course I took spent so much time on outdated cost optimization, concurrency control techniques and then just mention a single, very complicated yet important durability technique at the end. This isn't even that young of a technique anymore (1992) and there are tons of variants which are probably a bit more important to understand than all of the 2PL variants.
This looks awesome! This covers all of my complaints and even has some core material that I don't have but probably should have 'cached' in the back of my mind.
Props to Pavlo and the course TAs for the great work! A lot of DB courses don't really highlight how dynamic the storage & database systems area is right now but this course does.
This is why I am greatful I spent my first five years as a DBA: learning SQL was just a tool but why things worked or didn’t work, why queries were fast or slow, how to best define a schema for a given use-case (that’s key — schema design should fit the business use-case not some arbitrary “beautiful” theoretical model)
The database course I took covered that a bit. We used https://www.amazon.com/Database-Management-Systems-Raghu-Ram.... It's not the most up-to-date text (although it was when I was in school), but it's probably still the best source for the basics of relational database theory and implementation.
they are not that complicated. you just need a format of storing records where you need to store size of value and value itself in byte form so you can read it back and then it is ALL about indices. and that is really it. you can try to build a db by yourself from scratch of use some key-value db as base. the magic is usualy not in storage or indexing but in the query processing and parsing and optimization. that is essentially what makes dbs different. and then, of course, how they handle transactions. in my experience a simple write lock(mutex) is the standard, though you can get a bit advance and use more graular locking so your transactions, if they do not overlap, are way faster and not stop-the-world fast.
Yes, when you're only interested in creating a toy/proof-of-concept that doesn't consider: feature-completeness, optimization, concurrency, security, performance, transaction management, stability, etc.
I suppose you could say that _any_ complex product. "Google Search is just a simple web search box, connected to a text search engine with a little bit of linear algebra on the back end. It's really not that complicated."
He is a great teacher. And this looks like a great course. The project, which is to create a database in C, looks most interesting to me.
I really don't have a strong background in C so I just wish I could create the project in Python since that's what I mostly use. Does anyone know a database course which uses Python to create a database?
Andy Pavlo's video lectures are very good. Watched both introductory and advanced lectures a year or so ago. Recommended. Previous videos may be found from the channel.
Some of the best and most detailed information on database internals available on YouTube. Also great at integrating the current state of the commercial database industry with trends in academic database research.
Having worked as a programmer for the past couple years I can't help but feel that the stuff I learned at college isn't very relevant and I wish that I knew more about the actual database internals; query execution strategies, concurrency control and stuff like that. This seems like a good place to start.