If your working set is small, say 1TB -- and so fits in RAM -- for what kind of problems would using SQL be so much of an overhead that you need a different approach? And what would that approach be?
I suppose you could have a massive set of linear equations that you might be able to fit into 1TB of RAM, but would be difficult to work with as tables in Postgres?
I appreciate you taking the time to answer -- and I get that there's a reason for why we have graph databases. But I really meant something more concrete, as in here's a real-world example that isn't feasible to do on machine X with postgresql, but easy(ish) with a proper graph structure/db -- rather than "not all data structures are easy to map to database tables in a space-efficient manner".
It's very much dependent on how frequently you update the data and whether or not (re)loading the data or updating your structure in memory can be done efficient or not to determine whether or not such an approach is useful or not but going from 'too long to wait for' to 'near instant' for the result of a query is a nice gain.
In the end 'programmer efficiency' versus 'program efficiency' is one trade-off and cost of the hardware to operate the solution on is another. Making those trade-offs and determining the optimum can be hard.
But a rule of thumb is that a solution built up out of generic building blocks will usually be slower, easier to set up, will use more power and will be more expensive to operate but cheaper to build initially than a custom solution that is more optimal over the longer term.
So for a one-off analysis such a custom solution would never fly, but if you need to run your queries many 100's of times per second and the power bill is something that worries you then a more optimal solution might be worth investing in.
Right. But you've still not made a compelling argument that the "smart encoding of the data" and "clever search strategy" couldn't be done with/in SQL. I'm quite sure you could probably get a magnitude or two of improvement by diving down outside of SQL -- but there's a big difference between using the schema you have, and setting up a dedicated system.
I was perhaps not clear, but I was wondering if it's not often the case with real-world data, that you could create a new, tailored schema in SQL, and do the analysis on one system -- especially if you allow for something like advanced stored procedures (perhaps in C) and alternate storage engines.
I suppose one could argue when something stops being SQL, and start becomming SQL-like. But the initial statement was that "For some problems SQL would already be way too much overhead.". The implication being (as I understood it, at least) -- that you not only have to change how the date is stored and queried, but that SQL databases are unsuitable to that that task -- and that by going away from your dbs, you can suddenly do something that you couldn't do on a system with X resources.
I'm not saying that's wrong -- I'm just asking if a) that's what you mean, and b) if you can quantify this overhead a bit more. Are using bitfields or whatnot with postgres going to be a 20% overhead, or a 20000% overhead in your scenario? Because if it's 20%, that sounds like it can be "upgraded away".