
Making an Endpoint 50x Faster - BillFranklin
https://bilbof.com/2019/09/18/improving-api-endpoint-performance.html
======
anton_gogolev
The whole "...we don’t want duplicated subscriber lists" is a very arbitrary
requirement.

What's more, all this could have been avoided if a proper relational database
structure (not even denormalized, GordonS) was there in the first place.

~~~
cogman10
Agreed. This is a poor schema made worse.

Querying the json is why this is slow. Pull that garbage out of the json and
into a couple of tables and you'd be much faster.

~~~
Izkata
> Pull that garbage out of the json and into a couple of tables and you'd be
> much faster.

It looks to me like a user-generated search query, not normalized data. A blob
column makes perfect sense if this is the case, and pulling it out into other
tables/columns would significantly overcomplicate everything else.

~~~
cogman10
More complicated than a custom hashing scheme and required ordering?

I don't buy that. Tables aren't complex.

~~~
Izkata
Tables aren't complex, but this particular use can get there. Off the top of
my head:

1) Complexity inherent in search criteria: Multiple values per field and
whether it's AND or OR. (This can go much much further, as ours does, but I'm
limiting it here to what's in the blog post)

2) Centralization of criteria: Multiple tables pretty much means a table per
field, which are only related by foreign keys back to the central table.

3) Following on from 2, since there's no central location to identify all the
relevant tables, it's much easier to miss one in a JOIN.

4) Following on from 1: Serialization/deserialization, for creating new
entries and populating the submission form for an "edit" mode. (At minimum
you'll have to reconstruct the multiple-values-per-field situation from the
tables)

5) Following on from 3 and 4, alternate uses such as adding a search
description derived from the criteria.

There's easy ways around some of this, such as in (4) forcing all the app code
to go through the same serialization/deserialization (which sidesteps most of
the others), but that's no guarantee that there won't be stray code accessing
the tables directly for another reason (hitting on (3)), and almost completely
erases the benefit of fitting the criteria into a table relation. It's
essentially the same as just using a blob field, except a lot more
complicated.

Sometimes blob fields really are the right choice, even in a relational
database.

------
shrubble
Why not use a materialized view?

[https://www.postgresql.org/docs/9.3/sql-
creatematerializedvi...](https://www.postgresql.org/docs/9.3/sql-
creatematerializedview.html)

------
mobilemidget
"Because order should not matter, both arrays are sorted before they are
compared."

I do not understand this line very well. If order does not matter, why sort
anything?

~~~
ignoramous
Consider character sequences: _[n, e, t]_ and _[t, e, n]_. If you'd want to
know if the lists are equivalent (have the same characters, same number of
times), then sorting them would make the comparison straight-forward since
order doesn't matter.

See: [https://leetcode.com/problems/valid-
anagram/solution](https://leetcode.com/problems/valid-anagram/solution)

~~~
mobilemidget
ah yes but of course, probably did not have enough coffee yet when I read it
:) thank you

But then perhaps it is worth storing it sorted ( or as well if original value
is important) and making it even faster?

------
lowmagnet
I applaud the effort put in for performance, but why not just enqueue the
change notification and do the sends on the back end? This sort of fiddly
process is exactly why deferred action is appropriate.

~~~
hawkice
You're not wrong, but I think starting with structuring data to be reasonably
performant is always the best first step. You can always make it deferred
later if you want or need. This way you don't hammer your DB and leave weird
structural issues in your model.

------
GordonS
TL;DR: a short article describing a normalised schema, with big performance
gains when denormalising data.

Article is very application-specific, but is perhaps a reminder that
denormalisation can be useful when working with relational databases.

~~~
mhd
I wouldn't describe anything with a JSON kitchen sink field as "normalized".

