select age, sum(weight) as total_weight
group by age
having sum(weight) > 12;
select age, sum(weight) as total_weight
group by age
having total_weight > 12;
Also, optimisers aren't stupid, the actual sum() function will not be recalculated.
Answer from core PostgreSQL dev, Tom Lane
Anyway, overall very nice!
and Joe Celko https://www.red-gate.com/simple-talk/sql/t-sql-programming/w...
One question / bug report? Q7 rejects this:
name, weight, weight - (min(weight) over (order by weight asc rows between 1 preceding and current row) ) as weight_to_lose
is that much worse than lag() or is it just non-idiomatic?
(edit: and Q10 asks about "window", but it was quite hard to google. if you could provide another one of those links to the relevant docs, that could help a bunch :D)
Yep I need to fix up Q10 a bit.
There's also this: https://bugs.mysql.com/bug.php?id=48652 - still open after all these years (and yes I know there are much longer open bugs).
However, depending on your application, you may be enforcing these constraints elsewhere and not need them at the engine level, even if it's a nice-to-have.
(And yes, the workflow does include actions that involve multiple tables in one go.)
However, you really really need to be careful about SQL injection. I can see that you tried to lock it down as much as possible, as far as I can tell the account the queries are running under only has SELECT permission (no update/delete/etc). However I was still able to get some data you probably don't want me to have...
This query returns a list of all tables in your database:
SELECT c.relname||'' FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
And this query returns a list of users and their privileges:
SELECT usename||'', usecreatedb||'', usesuper||'', usecatupd||'' FROM pg_user
I guess your name is Andy based on your HN username, but who is Rusty? :) And that's just what I've been able to get in 5 minutes of trying, let alone a determined attacker.
Q7 includes output:
Misty 5.7 0.2
Oscar 6.1 0.4
Smokey 6.1 0.0
The site is down right now, so I had to test using my own db instance. Column and table names might not match.
distinct_weights AS (
SELECT DISTINCT weight
weights AS (
ORDER BY weight ) AS lag
cats.weight - weights.lag
FROM cats, weights
WHERE cats.weight = weights.weight
ORDER BY weight;
Any 5.5 null
Misty 5.7 0.2
Oscar 6.1 0.4
Smokey 6.1 0.4
SELECT c.name, c.weight, c.weight - lw.lower_weight AS weight_to_lose
FROM cats c
JOIN (SELECT weight, lag(weight, 1, weight) over (order by weight)
lower_weight FROM cats
GROUP BY weight
ORDER BY weight) lw ON c.weight = lw.weight
ORDER BY weight
I guess I need to rethink my design a little.
I think it's due to lack of reading the question properly and impatience.
But it also might be due to how programmers sometimes think/work. Often, when I am trying to solve a problem I like writing something (anything) down in broad strokes first so that I can then go back in and 'fix' each smaller part of the larger problem.
Thank you for making this.
Q6 has a flaw. You request "Order by: ranking, name" but the accepted solution sorts by rank only (Alfie should come before Ashes)
Nice quiz, by the way, and I like the topic. In my experience, too few devs / dbas make good use of window functions.
But I think the design was good. I also noticed the option to look at the schema, if needed.
1. Q0's success messages says "Group by and Having are commonly used aggregate functions". This terminology is incorrect. SUM() is an aggregate function. GROUP BY and HAVING are clauses.
2. Without an ORDER BY clause, there's no guarantee of the order of the result, but the quiz appears to require the output rows to be in the same (in the case of Q0, undefined) order as the expected rows. While this is unlikely to spontaneously change, it's like depending on the order of an unordered hash map. As long as you're teaching people things, it's good to instill the habit of always ordering their results.
I could see perhaps having a question that comes right after this one in which the OVER ORDER BY contains duplicates, that can be used as an opportunity to teach the "gotcha".
Alternately, if there's a guarantee of uniqueness, this can't happen.. but again it might be setting people up to try to apply what they think they learned to the real world and then finding out that their sums are wrong...
And here is a small suggestion, you might want to change the name of output column "unique" in Q3 because.
*expression* as unique --does work and is recommended answer
*expression* unique --fail because of use of a restricted word
*expression* "unique" --work because of double quote
PS: also apply to "position" in Q4 maybe thats intentional?
* add questions about range mode
* have questions which discuss default behavior regarding ordering and partitioning
* Have questions which rely on things like RANGE 3 PRECEDING or other constructs
> select name, sum(weight) OVER (PARTITION BY name) as running_total_weight from cats order by name
Sidenote: either HN killed your backend or I did with a recursive CTE. If so: sorry :)
Its a brand new project so I may need to do some hardening.
PostgreSQL 9.3.20 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bi
I wonder why so old version is being used. I will be not supported by the community pretty soon.
Plus there is the whole, "My report wants to use the same logic as this web form", oh it's in Java? Guess we'll just pull the data out, process it, populate a temp table, report it, clear those values....
If you have business data in SQL, just write the logic in SQL.
That said, the single biggest performance drain I've seen is people not using the database, and instead pulling data out to process in code. I don't think I've ever seen a case where it was faster to process data outside of the database - though there has also been plenty of times when the pertinent SQL to had to be written "smarter." Think in sets, not iteratively! There's a terrifying amount of loops in a lot of legacy SQL statements.
Even if you write optimized C code, you can't tend to beat well written DB code because the IO completely overwhelms the processing time. And somewhat counterintuitively, moving processing outside the database tends to increase overall database CPU usage, because you tend to force more IO through the system.
That makes sense, since IO costs more than most cpu processing, and if you calculate in the DB Engine, you don't tend to need as much raw data out, you only need the results.
Basically I want a modern, cost effective mainframe, that uses a reasonable language for the data layer. The earlier suggestions of Python + Postgres or C# + MSSql are nearer the mark, but never quite feel first class.
We're getting away from it by either moving to new tech or the tech is improving (adaptive query processing, auto tuning indexes, writable columnstore).
Even when I was an app dev I said you live and die by your database, either it's data model or by not putting processing in the right place. It's an architecture thing and something that (generally) has to be addressed early on.
Rewriting it as one SQL query made it 4 orders of magnitude faster.
I think the problem is maintenance instead of performance. When you really want performance (in data reads) youll be precomputing and storing the values. Then your report can filter and be indexed properly.
I dont think that any tech solution will give you fast reads and fast writes you need to pick one.
Your server probably has alot of down time ... use it , make a cron job and make a seperate reporting table ... reporting on raw data does not scale , window functions do not scale
Using window functions is equivalent to using a better cpu. Maybe the environment you work in is faster but you have not factored the logic to make the best use of the resources.
You can take a crap approach in java and port it to sql. But the approach is crap and you should create a seperate reporting process or at least use hadoop.
For Postgres, for example, you can write stored procedures using any of these: https://www.postgresql.org/docs/9.1/static/external-pl.html
But when I've been able to use Python functions directly inside my SQL code, it is a thing of absolute beauty.
One good reason to avoid it and stored procedures is for parallelism. But you need a lot of data to make that worthwhile. Especially in these days of 3+Tb RAM machines.
Also, the site is blocked for me currently...