I don't wanna take money from HN folks if I can avoid it though - if you wanna get the course listed on the post, use coupon code hnfrontpage to get it free this week. Enjoy.
(And if you're a bootstrapped startup using MSSQL, holler - I have a soft spot in my heart for that, and I'll do what I can to help.)
Cool! Thanks. I'll definitely check out the video. We use MSSQL at our startup. Initially, because that's what the founders were familiar with, but it really is a nice database to work with.
> If you have unlimited memory, network bandwidth isn't a problem and the additional garbage-collection won't kill your performance.
News flash: your database server doesn't have unlimited memory either.
About network performance - if you have such a large result set that you're worried about network performance, than the sorting work in the database is going to be very non-trivial as well. App servers scale out inexpensively - SQL Server does not.
It doesn't take more bandwidth to transfer unsorted than sorted data, and the memory cost will presumably be the same for sorting on the server as on the client.
Filtering should always happen in the database for the sake of bandwidth and performance, but this doesn't apply to sorting the resultset since sorting doesn't change the amount of data.
(Unless the sorting is an intermediate step before further filtering, in which case the advice obviously does not apply.)
No - since the metric used in the article is in terms of licensing-$, not a technical reason.
> "SQL Server Enterprise Edition goes for about $7,000 per CPU core – not per processor, but per core. A two-socket, 6-core-each server rings up at around $84k – and that’s just the licensing costs, not the hardware costs. You can buy a heck of a lot of application servers (even ones with 256GB or more of memory) for $84k.
> Consume all of the query results as fast as possible into memory in your app, and then sort."
BUT - if you are using Postgres, you might be able to use BRIN indexes as a cheap index to speed up this kind of operation, without needing a full B-tree type index.
Licensing costs is just a proxy for CPU time on the database, so in that regard it is still relevant to other RDBM systems. You could scale the database processors (vertical or horizontal), but usually it's easier to scale the app servers.
Anyway, the articles point is being taken a little too absolutely in this comment thread. He wasn't suggesting pulling in an entire multimillion row table and sorting it to extract 50 rows. Just that avoiding sorts in the database where it makes sense and deferring to the app can be a better solution. What those situations are could be an article itself.
The page for ORDER BY on the Postgres website, seems to indicate that it is made efficient by design for the common cases of ORDER BY ASC and DESC at least - but you have to use B-tree indexing for this.
Further reading seems to indicate that if you have WHERE, ORDER BY or GROUP BY statements they all have to be covered by (the same?) B-tree index.
Exhaustive testing from a DBA on StackExchange seems to take a query from over 1200ms down to ~7ms for the case where you can do this.
That's true for single-table queries, but less so for multi-table joins. The RDBMS may need to sort your tables (even multiple times) in order to join them together, or the WHERE/GROUP/ORDER can be on combinations of fields and conditions.
Depends on the sort. A naively implemented mergesort needs a temporary array of the same size as the the array being sorted. But whilst that takes up a chunk of memory for a while, it shouldn't be the sort of thing that makes GC choke.
And anyway, not many people will be using a naively implemented mergesort in this situation!
> The avoid ORDER BY is the main reason not to fall into per-cpu licensing (most commercial data engines). That's the licensing trap they make.
I used to agree with you, but in the cloud, all databases are essentially per-CPU licensing. It's just that instead of paying once up front, you pay by the hour.
That's also why I love database administration as a long term career bet, too. Now, I can go performance tune someone's database queries, and they see an instant reduction in their monthly bill, thereby justifying the DBA cost. If I tune someone's on-premises licensed server, they don't get an instant rebate - the licensing and hardware is a sunk cost.
Sorry but it's not the same. When you use your own hardware you have to scale it to handle the peak usage times. This is often a small fraction of the day.
I don't like putting databases in the cloud, but there you pay closer to actual usage instead of 24x7 at max capacity.
BTW I was thinking in open-source options. Or commercial ones with a less draconian licensing (if there are any left).
> I don't like putting databases in the cloud, but there you pay closer to actual usage instead of 24x7 at max capacity.
Bad news again - with Amazon RDS and Azure SQL DB, you pay a fee based on instance size, and you can't scale up/down instantly with demand. (You can scale automatically, but it takes quite a bit of work - you don't "pay actual usage."
Yes, the article still gives good advice for SQL Server.
I was meaning in the more general sense - Oracle hadn't released 12c yet, Postgres was at 9.2, MariaDB just released 10.0. Lots of changes in the other DBs to make them more competitive.
> One thing that has seemingly changed for SQL server 2016 is the licensing cost... $14,256 per core now??? Or is that only in a 2-pack?
That's for a 2-core pack with Software Assurance (maintenance), so it's roughly half that. It's the same cost it's been since SQL Server 2012 when Microsoft switched to per-core licensing.
Prior to that, it was per-socket, but still roughly the same cost for quad-core servers.
If you are doing an order by on a single table on an indexed column, is that expensive? If you are doing an order by on a clustered index, wouldn't the order by be essentially free?
But I never thought about it, but it does make perfect sense to sort on the app server and filter on the database. When hosted locally, it's practically nothing in terms of money, time to spin up another API server and add it to the load balancer compared to another SQL Server instance.
> If you are doing an order by on a single table on an indexed column, is that expensive?
No, but it sure is unusual. I don't know a lot of folks who only reference a single table in their queries. (Folks who do are generally using key/value stores, not RDBMS's.)
> If you are doing an order by on a clustered index, wouldn't the order by be essentially free?
Assuming you're using an identity (sequential) field, that's also incredibly unusual. The clustering key rarely has value to the select query.
Nice article, and well done to the OP, but this post has actually reinforced why Microsoft's SQL Server is the LEAST favourite SQL database I have ever worked with in over 30 years.
Routines tasks like setting granular permissions for users is a major exercise in confusion and "Lets see if this actually works like we expect it to..."
That we have to use a completely separate database collections in MS-SQL for temporary tables, rather than have built in memory or non volatile tables as part of the core system is mystifying indeed.
That's not what was implied. SQL functions are snippets of SQL you wrap in a function to use in a larger query. The issue being suggested is that SQL Server will go into single-threaded execution when SQL functions are used.
However, you can use stored procedures for the same purpose as functions without incurring the same performance penalty, so the advice was a warning about a specific SQL Server feature rather than the platform as a whole.
My mistake. I realize now the author is talking about user defined SQL functions and not app layer SQL functions for composing SQL. Thanks for the clarification.
I don't wanna take money from HN folks if I can avoid it though - if you wanna get the course listed on the post, use coupon code hnfrontpage to get it free this week. Enjoy.
(And if you're a bootstrapped startup using MSSQL, holler - I have a soft spot in my heart for that, and I'll do what I can to help.)