Also keep in mind that learning to "organize your data for fetching" is not necessarily something you can do before you start your project. Many (most?) times you can't predict which data access patterns will be most common and benefit from using Redis, etc.
Starting with a "slower, but flexible" datastore like a traditional relational database, monitoring which access patterns need a boost, and then optimizing or introducing a new datastore is almost always a solid plan of attack.
I still feel the canonical answer to "what should my default data management policy in write-some-read-a-ton situations" is to, at write time:
1) store an appropriate write-whole/data-mining-friendly format
2) ASAP, for each major view, write out a Redis-style O(1)-to-read data structure
3) think carefully about backup and replay strategies
You trade a slightly stale read of the very hottest data for much improved performance on everything else, and more importantly, much simplified view code.
The best reference I have found for this pattern, and it isn't great (too big-SQL-centric), is "command-query-responsibility segregation":
if the CEO/owner/founder of your company is non-technical he/she will request the data in ways you wouldn't have thought about in advance. That's just reality. That makes Redis not appropriate for most companies. It's also too expensive for side projects. So that leaves technically-led startups. Which is a good chunk of companies (and probably the funnest to work for).
Lots of people use Redis at a cache, not a primary data store. You can have full querying in your SQL database and fast access to common requests using Redis.
In what way is it too expensive for side projects? It's the easiest data store to compile and run that I've used.
How are 3rd party hosting options part of a cost comparison?
Hosting any DB offsite comes at a cost, and it does not appear that any one database platform has an advantage over another in terms of a service provider.
My current little side project uses about 200-300 MB of database storage. Using Heroku, that would cost (pr month) $9 on their shared postgres database, $10-15 using mongodb, $50 using heroku's dedicated production postgres service, or $125 using redis.
I stand corrected. I was using the prices from redis-to-go, which was the only available option last time I checked, and the service with the big prominent link at the top of the heroku pricing page. Good to see that there is some competition in the area and that prices have come down a bit. Still not cheap enough to move my hobby project from a 1GB VPS, but reasonable enough if I ever turn my project into something that might make money.
First, running redis remote where you have WAN times involved is typically not the most performant. Second, if it's the cost, run it on a VPS where you'll get more memory for the money (hopefully the same VPS provider you use for other things).
Data should not be organised based around retrieval or insert / update patterns but organised according to the model that best captures the essence of what the data is. That may sound fluffy, but most of the time data captured is captured following something real happening that caused data to be generated. Your data model needs to make sense in the context of the thing that happened in the real world, not in the context of what is inserted or how it is read.
The issue being that youre methods of collection and retrieval will change over time and your data model needs to support that and still make sense for existing data.
While what you describe is certainly the ideal, and likely applicable in a variety of situations, there are a lot of real world situations where it can't cut it. Antirez says it best:
remember all those stories about DB denormalisation, and hordes of memcached or Redis farms to cache stuff, and things like that? The reality is that fancy queries are an awesome SQL capability (so incredible that it was hard for all us to escape this warm and comfortable paradigm), but not at scale.
You'll be hard pressed to find any medium-sized project in the wild that doesn't require a layer of denormalization or caching to be reasonably responsive (I mean, some frameworks come with that built-in - their users might not even be aware this is happening). You might have some beautifully crafted model underlying that layer, but don't fool yourself into thinking that's all there is.
Starting with a "slower, but flexible" datastore like a traditional relational database, monitoring which access patterns need a boost, and then optimizing or introducing a new datastore is almost always a solid plan of attack.