Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Resources for Learning about Databases
100 points by ud0 on Sept 21, 2019 | hide | past | favorite | 27 comments
I recently got a Frontend Engineering offer from Facebook. I have dabbled in backend work earlier in my carrier but never did anything beyond simple API’s on a single server. I set a goal for myself that every year I pick a computer science topic & study it thoroughly. I dont have a CS background but I am begining to love CS.

I started with Data Structures & Algorithms last year, studied & did over 250 leets, this is why I was able to land an offer with FAANG.

Next on my list is Databases, I want to know how they work internally, build a simple RDBS from scratch, learn SQL(I know simple CRUD operations) advanced concepts like procedures & the latest that is being used today.

I have googled yes, but I havent found any resource that meets my needs. I also plan to switch to backend soon.

Thanks in advance

Edit: I know I will not be building databases at Facebook, & I also know they probably have internal tools or ORM to access databases. My goal is not to become a database developer but to have a good knowledge of how they work just to satisfy my curiosity.

It gets recommended all the time in these kind of threads, but it's so good I don't care. Bill Karwin's SQL Antipatterns. You need a decent understanding of the basics to get the most from it, but there's some excellent information and examples of what to (and what not to) do.


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems


Fantastic book. A really good overview over distributed consensus, leader-follower-architectures, data query languages and data encoding.

And IMO the best part, explaining very clearly different isolation levels and the difference between serializability and linearizability.

The book is much more database-relevant than the title might make you think.

How dry is this? I've had hit or miss success with technical books, and found that the writing style can help dramatically to feeling engaged and not dry or boring.


Not very dry in my opinion. Good mix of explanations, information, and context.

I’m about 2/3rds through it.

I think it's fine for this kind of book. I read a couple chapters a night ATM. Not many jokes, but nice to read.

If The Dragon Book is 0 out of 10 on the dry scale this is a 4

Much better than many other technical books.

Great intro and overview: http://coding-geek.com/how-databases-work/

Great book: https://dataintensive.net/

Also great read and overview: http://www.redbook.io/

Great paper over-viewing the architecture of a DB: https://perspectives.mvdirona.com/content/binary/Architectur...

If you're looking into building your own database, there are some great open source projects you can reference here: https://github.com/danistefanovic/build-your-own-x#build-you...

If you want to actually dive into source code - SQLite is amazing. It has very clean and readable code, so I'd suggest using it as a reference as well: https://github.com/mackyle/sqlite

I have three things for you

1. Designing data intenstive applications

2. Database internals https://www.amazon.com/Database-Internals-deep-dive-distribu...

3. Andy Pavlo's database course videos at cmu and guest lecture series https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA

Anything by Joe Celko: SQL for Smarties, Trees and Hierarchies in SQL for Smarties, Joe Celko

Also, the internals of Django ORM (https://github.com/django/django/tree/2.2.5/django/db/models) and SQLAlchemy Core (https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...) and its dialects (https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...) + ORM (https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...)

I really suggest against building a database from scratch. It's just too annoying, and there's so much code to write (parser, storage, indexing, query planner, connections). If you're interested in internals, I'd say look at the sqlite codebase instead: https://sqlite.org/src/doc/trunk/README.md . If anything, reading code that works is probably more useful than writing code that almost certainly won't without months and possibly years of effort.

A lot of the more complex database things are only really learned by having a large database system. Performance, distributed databases, and complex schemas come to mind here. Most of the times with simple examples, you'll do something wrong performance wise, but you'll never know because of the scale (such as forgetting an index, or doing a bad join).

Many times, you don't need to know that much about database other than some basic SQL.

These are more academic than practical (i.e. build a DB from scratch) but still interesting I think.


I found this an enjoyable resource for learning about one of the fundamentals of RDBMS, indices: https://use-the-index-luke.com/

anything on this channel - https://m.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA - CMU DATABASE GROUP. all thanks to - https://mobile.twitter.com/andy_pavlo - Andy Pavlo - has a quote, something like "I only love two things, my wife and the databases". Follow his lectures and read his suggested papers.

I would pick one RDBMS and try to dissect it, there is a lot to chose from nowadays, you can check out db-engines to get a general sense of what's out there:


From what I have seen most enterprises today will be using Oracle or Microsoft, however PostgreSQL seems to have gained popularity with the web developer and small business crowd (as well as with the HN community). I have been an Oracle database developer since 2015 and would definitely recommend going that route if it interests you, at the very least it might be a good starting point because of the fantastic documentation, here's a great guide I recommend to get you started with all the basic concepts:


Readings in Database Systems http://www.redbook.io/

Alex’s book: https://www.databass.dev/

I just read a few chapters and it's good so far.

AWS re:invent 2018 talks:


He has a sequence of 2-3 great talks on DynamoDB, the history of relational databases and the rise of access-pattern oriented db design.

I back up the previous hints for Designing data intensive applications and Database internals. I would suggest also to look at Jepsen tests, https://aphyr.com/tags/jepsen, and Adrian Colyer's blog, https://blog.acolyer.org/

You could port a database from one language to another as a learning exercise.


Well, what the OP asked was not easy. :)

Anything with exercises and answers?

Andy Pavlo courses on youtube.


Do you want to build a database or use one?

Also prolly at Facebook they use some API to access the database.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact