
How I Reduced My DB Server Load by 80% - itsderek23
https://www.schneems.com/2017/07/18/how-i-reduced-my-db-server-load-by-80/
======
brasetvik
Note that Postgres supports functional indexes, and the case in the post (a
`lower(column)=`-clause not being able to utilise the column's index) is the
example used in the documentation:
[https://www.postgresql.org/docs/current/static/indexes-
expre...](https://www.postgresql.org/docs/current/static/indexes-
expressional.html)

~~~
brightball
Was coming here to post that.

Being able to create indexes based on the results of a function is the
solution to so many problems. You can even index the result of an XPath
function on a huge, compressed XML document.

~~~
jaggederest
Partial functional indexes are amazing. You can create an index that not only
elides execution of the function, but automatically only includes the subset
where that function would be relevant. Makes the index smaller, to boot.

    
    
        create index repo_name_lower on repos(LOWER(name)) where name IS NOT NULL;

~~~
combatentropy
Partial unique-constraint indexes are useful too

------
combatentropy
This story supports my growing theory that you should put as much of your
app's rules in the database as you can.

There were three problems with having the rule in Rails:

1\. The need for an index was easily overlooked.

2\. The rule would be bypassed if a different app used the same database.

3\. The rule wasn't even foolproof. Only a database constraint would guarantee
uniqueness when two users are saving at the same instant.

The problem is, SQL is hard. We should not forget how much a programmer must
learn. For example: Ruby, Rails, Linux command line, HTML, CSS, JavaScript,
vi, how to exit vi, etc. Each of these takes years to master.

SQL is especially SQuirreLy. However, it can't possibly be worse than learning
the myriad JavaScript frameworks and complicated server-build tools that are
completely optional for 99% of us. My advice: Don't do a SPA. Spend your time
on SQL instead :D

~~~
BurningFrog
Well, there are problems with that approach too.

At the far end, any programming logic you put in the DB will be impossible to
scale if needed. You can only have one DB, but any number of Rails servers.

Database constraints are a matter of preference and skill set. They make some
things much harder when you have to delete things is certain convoluted
orders, and I can get the correctness through tests. Your decisions may
differ. That's fine.

You should always put indexes on things you query by. Be vigilant. This is
_the_ one case where "only optimize when you have a performance problem" rule
doesn't apply.

~~~
jaggederest
Indexes are useful for speeding up select statements but they run a
significant cost on insert and update.

Sometimes it's better not to index, if you have a table that is updated often
and queried in varying, complex, rarely repeated ways.

~~~
BurningFrog
Yeah, I'm aware of that, and I'm sure these cases exist, but after 20 years
around these things I have yet to see one in real life.

~~~
jaggederest
FWIW I've worked for multiple companies where tables couldn't reasonably be
indexed.

Special cases though, usually write-only analytic tables that are queried once
in a blue moon and have incredibly high traffic (~millions of rows per hour).

------
cletus
I've seen a litany of these kinds of posts and I'm always amazed by two things
when I see them posted on HN:

1\. A cadre of diehards can't wait to post how amazing Postgress is or would
be for whatever it is the OP is doing (as an aside, why isn't Postgres more
popular if it's so amazing?); and

2\. How averse people are to actual SQL.

Years ago I dealt with this crap in the Java world back when Hibernate and the
like were all the rage. I was always amazed at how much confirmation bias
there seemed to be. People decided these ORMs were amazing and then completely
ignored all the bugs introduced by this layer and effort spent trying to
figure out what the ORM was doing and how to make it do the right thing.

Back in the day I always liked a Java data mapper framework called iBatis (now
dead, replaced by Mybatis it seems), which was pretty simple. Write some SQL
in an XML file and call that SQL from your Java code. It was parameterized (so
no SQL injection issues) and you could still do some funky things with
discriminated types and the like. Plus, analytics were super easy because you
knew how often each query was called and how long it took. Also, you could
easily EXPLAIN PLAN those queries if you even had to (usually needed indexes
were obvious).

Compare this to the auto-generated SQL from the likes of Hibernate. ugh.

I've come to the conclusion that people have this tendency to decide X is bad
and then go completely out of their way to avoid X. You see it with SQL and
ORMs. It largely explains (IMHO) thing slike Javascript and GWT.

At least half the time "X is bad" really means "I don't understand X and I
don't want to learn it".

Joel Spolsky's "leaky abstractions" is good and time-honoured advice.

Take the Hibernate example. Once you bought into that framework you had to do
all your data access that way or you broke the caching. That's mostly bad.

People also overestimate their needs. They rush to create Hadoop clusters and
distributed NoSQL solutions because, you know, relational DBs can't keep up
with their "Big Data" (which means, millions of rows) when in fact you can
dump billions of rows into a single MySQL instance.

~~~
chrisan
> 1\. A cadre of diehards can't wait to post how amazing Postgress is or would
> be for whatever it is the OP is doing (as an aside, why isn't Postgres more
> popular if it's so amazing?); and

I'm a happy MySQL user, all of my side projects and on the job work is done in
MySQL. That said, how else would Postgres become more popular if there isn't
some level of evangelism to spread the word? I like reading about Postgres
features and maybe someday I will switch.

For now, with my/our needs, MySQL is fine

------
xfour
Exactly why ORMs are a bad idea. I've always wondered whether ORMs help or
harm. I feel like the one reason to use it is if you have a development team
that isn't capable of writing SQL which in itself is bad.

~~~
esaym
Well the problem is, if you don't use an ORM, you'll invent one
yourself...only poorly. And new hires will end up taking a ton of time to
learn this "custom" orm framework of yours.

And there are many bad, or simply, "too simple" ORMs out there that really
don't help you much. I haven't really found one better than Perl's DBIx::Class
(though outside of Java, Python, Ruby, I have haven't really looked).

Case in point, I recently found this little gem[0] allowing easy correlated
subqueries. I basically took a webpage that was loading in over 2 minutes, and
reduced it to about 100ms and the outputted SQL was about 2 pages long (from
about half a page of custom resultset orm code). I used several correlated
subqueries to sort of pivot part of a table (well several tables actually).
The original author of the code I was working on was fetching entire tables of
data, with each row fetching (joining) to another entire table (it did this in
several levels really) just to sum some values. This was all valid ORM code
(even with handy 'if' statements checking every row id to do in ORM join (yes
he didn't even use a dang where clause!)) but it showed a true lack of
knowledge of the ORM at hand and even SQL in general. Nonetheless, I saved the
day :)

[0] [https://blog.afoolishmanifesto.com/posts/introducing-dbix-
cl...](https://blog.afoolishmanifesto.com/posts/introducing-dbix-class-helper-
resultset-correlaterelationship/)

~~~
toast0
You can absolutely build a product on top of an SQL database, by just writing
the SQL queries you need.

As a bonus, if you're thinking ahead, you can avoid writing queries that will
perform poorly. If you're not thinking ahead, at least you can more easily fix
the ones that do.

I've never really understood why people want to interact with their database
outside of its native query language. There's just so much mismatch between an
ORM and the underlying interface, it feels like fighting all the time. And I
don't feel like it's really saving any time either. The one thing that's sort
of nice is it's easier to write a general filtering function, but SQL
filtering lets you write so many things that won't perform well, making that
easier isn't really helping anybody.

~~~
ykler
I don't want a full-fledged ORM, and I sort of like SQL, but just normalizing
nested objects to put them in the database and then unnormalizing them on the
way out is really annoying to do in raw SQL in my experience. Every time you
want to store a new object in the database, it feels like so much effort to
write a bunch of boilerplate functions. Or do you have a solution to this?

~~~
stuartaxelowen
If you're using python, you can use named tuples for everything. It's really
nice! Check out the psycopg2 namedtuple connection.

~~~
ykler
I just glanced at this, and maybe I don't understand, but it sounds like
psycopg2 namedtuples only work on one table at a time; is this right?

~~~
stuartaxelowen
The namedtuple is just a datastructure interface with psycopg2. For instance,
you could write an arbitrary query "SELECT left.x AS cats, right.created_at
FROM left JOIN right ON etc" and you'd get namedtuples with attrs `cats` and
`created_at`. Similarly, you can insert namedtuples just like rows, since they
are iterables, or as namedparameters with the nt._asdict() method. It's great!

------
sidlls
> Rarely do we consider how one query or a series of queries could interact to
> slow down the whole site.

That doesn't seem right to me. It's almost always something to consider when
designing the data model. Maybe I'm being uncharitable, but this seems to me
to be equivalent to claiming we rarely consider the use of an algorithm or
interactions between algorithms and data structures when writing some code. I
mean, for toys that's fine, but I wouldn't defer this discussion for something
I intended for public use.

------
petergeoghegan
> It turns out it was coming from this line in my model. > This innocuous
> little line was responsible for 80% of > my total database load. This
> validates call is Rails > attempting to ensure that no two Repo records get
> > created with the same username and name. Instead of > enforcing the
> consistency in the database, it put a > before commit hook onto the object
> and it’s querying > the database before we create a new repo to make sure >
> there aren’t any duplicates.

I still can't believe that Rails even attempts this. It's _simply not
possible_ to do this kind of enforcement in a race-free manner using SELECT
statements with Postgres.

------
DigitalJack
I really like all these debug and analysis reports coming into HN lately.

~~~
djaychela
Yeah - I read some of them, and while I don't necessarily 'get' all of it,
there's a lot to be learned in all these things, and it's so good that people
spend the time to cover in depth what the issues were and how to fix them.

------
ozim
For all those ORM's are bad people:

`I’ve been seeing that stray 30s+ spike in request time daily for months,
maybe years. I never bothered to dig in because I thought it would be too much
trouble to track down. It also only happened once a day, so the impact to
users was pretty minimal.`

------
lo_fye
TL;DR - spend time configuring it, write efficient code, and understand what
your 3rd party code is actually doing.

------
tyingq
Tldr: Your ORM might generate heavyweight queries.

Edit: in this example, case insensitive uniqueness validation in Activerecord

------
throw2016
Rails is a pretty popular framework and widely used. How come something like
this was not caught earlier, presumably its affects all Rails applications.

Why would anyone code this kind of inefficiency instead of using inbuilt
constraints. Has the code been reviewed, tested? Too many questions.

It's surprising given how popular Rails was and still is that something which
should have been caught in the early days of Rails is discovered now years
later. Aren't all the production apps seeing this? Didn't Twitter see this?

The real concern is a lot of highly promoted technologies in HN do not get the
proper technical scrutiny that one should take for granted in a technical
forum and increasingly hype is conflated to quality.

------
Entangled
Those who ignore CODD will learn it the hard way, forcefully.

~~~
joshribakoff
That just defines what relational is, it doesn't prescribe it as a best
practice. There are use cases where its ill advised, like EAV.

------
alkz
TL;DR guy has a scheduled job which puts load the db, removes a query which
normally runs for 1.9ms instead of implementing rate limiting

------
tkyjonathan
I do this everyday for the past 10 years.. This is my bread and butter.
[http://www.jonathanlevin.co.uk](http://www.jonathanlevin.co.uk)

