
Ask HN: Tell me about modern databases - thom
Day after day, I sit here crafting data for some vague analytics with Postgres, close to despair. All I want is for the query planner to take a view, go away for five or ten minutes, and then run it optimally, or at least suggest some indexes, but this isn&#x27;t the way it seems to want to allocate resources.<p>I suppose I&#x27;ve always been slightly suspicious of modern database technology beyond stodgy stuff like Oracle, SQL Server, MySQL and PostgreSQL, but I&#x27;m at the point when I would like to know what my options are.<p>I don&#x27;t think my workload is particularly outlandish - 35m events with around 100 attributes, timestamped, some GIS data. Some typical workloads include things like sessionization, grouping events together, and asking questions like &quot;how long after events of type X did an event of type Y happen?&quot;, &quot;what was the last event in session Z?&quot;, &quot;how long is the path from event A to B to C?&quot; On top of that, all sorts of aggregates get computed over events. Training data gets extracted and predictions fed back in via bits of Clojure or R code.<p>I am reasonably au fait with modern SQL, I write a lot of window functions, I avoid some common pitfalls, I can generally index my way out of most performance bottlenecks, but my word it feels like a slog some days. So, I suppose my question is this: what am I missing? What fancy new technology exists that can take complex queries, optimise them and give me answers. I don&#x27;t care about scaling to millions of requests, I don&#x27;t care about latency, I care about how much time I&#x27;m losing looking through EXPLAIN ANALYZE outputs instead of doing real work.
======
BrentOzar
Dell Software's Toad Query Optimizer (for Oracle and SQL Server) has an engine
that does exactly that. It takes your SQL, tries rewriting it a bunch of
different ways, throws in different compiler hints, and measures the response
time/reads/CPU/etc for each execution, then gives you a set of graphs and
recommendations.

I wanna say it was around $1500/seat last time I looked at it. It's worth
every penny if you have to write a lot of complex queries. I've bought it for
just one senior member of a dev team, and everybody brings their worst queries
to that person for tuning.

Here's the trick, though: after it optimizes a query for you, keep your
original query around. You're not going to want to try to edit the query that
Toad produces - it can be horrifically complex. Instead, edit your original
query, then run it through Toad again.

~~~
_RPM
Sounds like a mistake to only have one member of team have it. Why does this
member get special treatment?

~~~
bkovacev
Why does it have to be a special treatment?

~~~
_RPM
By definition, this person gets special treatment. I don't know why it has to
be.

------
eschutte2
It seems like the first question to ask is, where is the query planner falling
down? What are you finding in those EXPLAIN ANALYZEs that is wrong? When you
say "that isn't the way it seems to want to allocate resources," what does
that mean - do you mean you want it to auto-create indexes? SQL Server used to
have a tool to suggest indexes based on a query workload, is that what you
mean? Have you used [https://explain.depesz.com/](https://explain.depesz.com/)
?

~~~
thom
Yes, I spend a large part of my day looking at explain output, often on that
site. The SQL Server Database Engine Tuning Advisor is reasonably good, and
something like that for Postgres would be great, but ultimately it doesn't
seem to look at a query and come up with a semantically identical but
structurally better way of phrasing it - it will never fix unnecessary big-O
performance issues (in the way that, for example, posting a single query on
Stackoverflow might). But even if such a tool exists for Postgres, I'd be
running it _all the time_. I'm just naively wondering if technology has moved
on. I have basically no writes outside an ETL process, much of the time this
is effectively single-user ad-hoc stuff. I realise I might be being hopelessly
naive, but it feels like I can clearly specify a query, and I have complete
control over my schema. Something must exist that requires me to pay more
money but ultimately care less about endless performance tweaks to create a
view that is fast?

~~~
al2o3cr
If you're doing sufficiently focused sorts of queries - the question mentioned
"find the path from A to B to C" \- then you might get value out of a DB
specifically focused on that question (a graph DB, or a time-series one). The
Postgres FDW stuff may help you make integrating one of those less painful.

Before you do that, though: have you thrown RAM at the thing? 35m events
sounds like a lot to humans, but not when you can rent machines with 1.2TB of
_RAM_ by the hour...

~~~
thom
I was more thinking of spatial stuff when I talked about paths, and PostGIS is
a pretty productive platform for me so far. Example calculations would be
taking the geodesic distance moved over a series of events over the total
length of the path to calculate a score for 'directness'. PostGIS has
aggregates to create lines from points, measure the total length etc.

------
brudgers
I'd suggest the little red book (Stonebraker's not Mao's):

[http://www.redbook.io/](http://www.redbook.io/)

------
rer
You're missing something like this:
[https://www.monetdb.org/](https://www.monetdb.org/)

~~~
thom
I seem to be the exact target market for this sort of thing, thank you! The
SQL implementation seems reasonably modern, and while the GIS features are a
little rudimentary there's at least something to work with.

My worry about column stores is that I can't guarantee my workload is
optimised for that paradigm either - imagine a window function with multiple
partition criteria etc etc, but I suppose it's worth testing.

------
bachback
SQL is horrible for many use-cases. have a look at
[http://datomic.com](http://datomic.com)

~~~
thom
Is Datalog naturally fast for certain types of queries, or do performance
gains for Datomic over RDBMSs come from being able to scale up a cluster? I
don't have a feel for what Datomic's sweet-spot is, despite having much love
for Clojure and its ecosystem.

