
A query had to go - andrewfromx
http://activityclubapp.tumblr.com/post/161511428077/a-query-had-to-go
======
matt4077
It should be count(*), not count (activity_type_id), because
count(expressions) evaluates expression for every row returned, whereas what
you actually want is the total number of rows (which also happens to be what
you get because activity_type_jd always evaluates to true.

Then, I'd try an INDEX(user_id, activity_type_id) on activities. InnoDB only
uses a single index per table IIRC, and that would mean reading the entire
table every time.

Lastly, and I'm not sure if this makes a difference, but it feels wrong to
specify the activities.user_id = #{id} in the ON clause. It really should be
in WHERE... That may not change anything, or it could cause some really bad
cross products currently being produced as intermediate results.

~~~
srett
> Lastly, and I'm not sure if this makes a difference, but it feels wrong to
> specify the activities.user_id = #{id} in the ON clause. It really should be
> in WHERE... That may not change anything, or it could cause some really bad
> cross products currently being produced as intermediate results.

Haven't really used mysql in recent years (6+), but I remember once optimizing
a query with about 8 inner joins, grouping and a very long where statement.
After setting up all the indexes it still didn't perform as we hoped. Moving
some of the conditions up into the ON parts just like in that blog post
drastically improved performance. It seemed like in the naive approach, mysql
really did all the joins first and only in the end started filtering. I was
quite surprised that wasn't handled by the query optimizer, but maybe that has
improved since then.

~~~
dhd415
I've been consistently surprised by how little optimization MySQL does on
joins. As of 5.6, it didn't even eliminate join clauses that weren't
referenced in either the select or where clause which should be a pretty basic
optimization and one whose absence hurts a lot given MySQL's nested loops join
algorithm.

------
spotman
Disclaimer: I work on this project w/ the author.

The author writes about how he used this system to solve a case of a slow
query. However, this is really sort of the wrong query to illustrate why we
built this. It's more of a "oh hey look since we already have this data super
fast in this other system, lets just use it."

What the system does is allow us to arbitrarily summarize at whatever
granularity we want, any second, to any second.

For example we can ask it to summarize January 4th at 11:22:02 to March 1st at
16:22:09, and so however many writes happened during that time at any second
are tallied up very quick, because of the way the data is stored. So even
though lets say during this 3 month~ period you have 10,000,000 writes for one
user some that say "add 3" or "remove 4", you will only query at a second
granularity around the edges, and the middle you will query whole day, or
month buckets.

Another way of explaining the point of this system is that it allows us to
really quickly (well under 1ms) summarize any data in the past N days (N
usually equals 120) from any start to any stop point we want, using any
granularity we want, and we can do it concurrently and generate this for 1000
users at a time and it all can happen in 30-50ms. Considering that the
underlying data is recorded at second granularity, and can be retrieved at
that granularity and automatically expired when this granularity is not
needed, redis is the right tool for the job here. To accomplish this in pure
sql you would have to log all writes you make to the table, and query this
change logging table in a similar way. So not only would this add a lot of
overhead to the database as a whole, but a lot of churn to remove the lower
granularity when its not needed.

We will be polishing the system more over time, and more blog posts will come
out when we release some of the client code ;)

------
onli
I think they could just have tried optimizing the query? My knowledge could be
out of date, but I read that the problem with Limit is that it is applied on
the end. Meaning their original query will happily take all available results,
sort them and then return the first 9. If normal values for freq are known, a
simple WHERE freq > X could have helped.

That might also make the join less painful, right? Additionally, I think there
might be a logic issue with the query itself, but I can't fully wrap my head
around it.

Also, activities.user_id = #{id} smells like an SQL injection. But maybe
that's just because of the simplification for the blog entry, or id not user
chose-able. Still.

~~~
andrewfromx
hey feel free to checkout the golang code here
[https://github.com/activityclub/pointspaced](https://github.com/activityclub/pointspaced)
it's way more versatile than sql for our needs. BTW the #{id} we call .to_i on
it first so ruby will make that 0 if it's a string with a single quote or
something.

~~~
matt4077
I've had only a cursory look, but it really feels as if you're reinventing
database indices.

Indices are based on the same logic, i. e. "99% are reads, let's try to do
more work on writes".

Redis is great. But to be honest: so is MySQL. If you give MySQL enough RAM
and create the right indices, you can easily get to 6-digit numbers of queries
per second.

~~~
andrewfromx
[https://news.ycombinator.com/item?id=14500889](https://news.ycombinator.com/item?id=14500889)

------
andrewfromx
hey mods, we had a Tumblr issue, can u edit link to
[http://activityclubapp.tumblr.com/post/161513427912/a-query-...](http://activityclubapp.tumblr.com/post/161513427912/a-query-
had-to-go-this-entry-is-our-first-entry)

------
mbertschler
Working link:
[http://activityclubapp.tumblr.com/post/161513427912/a-query-...](http://activityclubapp.tumblr.com/post/161513427912/a-query-
had-to-go-this-entry-is-our-first-entry)

------
marzell
Looks like a page had to go as well. (404)

------
default-kramer
I don't have MySQL handy, but this query doesn't even look like it should
work. It groups by "activities.activity_type_id" but then includes
"activity_types.*" in the select list.

Unless MySQL is smart enough to do something like "implicit grouping" where if
you group on the primary key it knows that all other columns of that table
could also be included in the group... in which case I really should give
MySQL another chance!

~~~
Diederich
I believe MySQL does exactly this, but it's been a number of years since I
worked with it in any depth.

~~~
default-kramer
Ah, I found it. It turns out if you include nonaggregated columns it will just
pick one value from the group arbitrarily.

[https://dev.mysql.com/doc/refman/5.5/en/group-by-
handling.ht...](https://dev.mysql.com/doc/refman/5.5/en/group-by-
handling.html)

------
tky_jonathan
Just add an index or summarise periodically into a summary table.

------
Terr_
404\. Link dead?

~~~
andrewfromx
[http://activityclubapp.tumblr.com/post/161513427912/a-query-...](http://activityclubapp.tumblr.com/post/161513427912/a-query-
had-to-go-this-entry-is-our-first-entry)

------
dadvocate
Four O Four

------
dredmorbius
404

