1. As they're called in postgres at least
If you start to need any kind of flexibility over refreshing the table, you'll need to move to using a plain old table anyway.
It's my understanding that there is a proposed feature, with a proof-of-concept floating around, called "Incremental View Maintenance", which is auto-update of dependent Materialized Views when their dependent base-tables change.
Materialized view with IVM option created by CRATE INCREMENTAL MATERIALIZED VIEW command. Noted this syntax is just tentative, so it may be changed.
When a materialized view is created, AFTER triggers are internally created on its all base tables.
When the base tables is modified (INSERT, DELETE, UPDATE), this view is updated incrementally in the trigger function.
It's the one thing I wish Postgres had that it doesn't. You have to use triggers to do this currently.
In particular, an event-sourced world becomes much easier to implement and maintain with the DB handling all the incremental magic.
But for simple things they’re magical.
The best method I've personally found for window functions is cross/outer apply + narrow indexes with included columns. A lot of times you can get away without an indexed view at all.
> you need to hint the server to actually use the indexed view or it will just use it as a regular view which I don’t understand the reasoning behind
SQL Server Enterprise will use indexed views automatically. But you gotta shell out the big bucks for that improved query planner.
LATERAL JOINs (PGSQL). In SQL Server & Oracle they are CROSS APPLY & OUTER APPLY.
This will change your SQL habits forever. Lateral joins give you the power of iteration in set-based operations that are mind-numbingly easy to implement and understand.
The solve some classes of performance related issues, especially if you're joining to certain kinds of group-by sub-queries or need to join on the output of a function. If your surrounding query is defining the constraints, you can push that into otherwise unconstrained sub-queries.
There was a minor learning curve, it took me a couple hours for it to be worked into my default mindset. In the end, wonderful technique that can save dropping into procedural code in any number of cases. The biggest danger is that it is a kind of iteration and where the performance can be greatly enhanced in say OLTP style queries, but it may not be a net gain in larger unconstrained queries that will cause that sub-query or function to get hit many times... I think getting that into your head is the larger part and the declarative nature of SQL will cause that to be a little outside of your vision no matter your background.
I like to have an intermediary table where I can verify what's going to be deleted (or updated) before it runs. Even better it lets you save the mappings of old ids so when you find out some downstream system was using them you can fix that too.
I know it's just an example but soft deletes solve a huge range of problems.
The eight stages of DELETE:
- SELECT what you're deleting;
- fix the mistake in the query (there's always one);
- SELECT again;
- BEGIN TRANSACTION;
- SELECT to confirm the right deletion -- twice for good measure, and maybe ROLLBACK once or twice if you're feeling twitchy;
- Hover finger anxiously over the un-entered COMMIT command for several seconds while resisting Dunning–Kruger effect;
- COMMIT. :)
Give the table an DeletedTimestamp and DeleatedReason column.
Mark the records as deleted.
Run around like a lunatic and fix queries so they don't show "deleted" columns to the rest of the app.
Make a page in the application where people can "restore" deleted items. Because they do stupid things too once in a while.
at least you need something like DELETE FROM foo WHERE deleted < NOW() - '30 days'::timedelta (or whatever the correct syntax is)
However, it won't save you if realize that there's an issue after you commit. Maybe I'm just paranoid but I like to have both a snapshot before a major change, plus I'm fond of having a spare (via delayed replication or log shipping).
Ideally, all SELECT queries are forced through a view layer (which already has an IsDeleted = 0 restriction), and you can just schedule a nightly job which deletes or archives all the rows with IsDeleted = 1, then refresh the table statistics automatically.
1. Create covering indexes sorted by each column to be searched.
2. Enable sqlite_stat4
3. Use WAL mode.
4. Set a busy_timeout
5. Vacuum and analyze daily.
6. Nobody has heard of Jango.
Vacuum & Analyze are two fun ones and I think postgres in particular would do well by investing a bit more into the default configuration of autovacuum since the out of the box settings are pretty conservative compared to the state most out of the box DBs are used - but I think that DB configuration is very much an unsolved problem in general and can be a quite surprising source of major performance gains (or losses!).
I mentioned mat views elsewhere - but if you are really hitting quite predictable usage patterns (where covering indexes are stable enough) then some well defined mat views might go a long way to tuning performance that little bit more - just be aware of the fact that rematerialization isn't instantaneous and if you have a high rate of change in the data then you'll probably want time based re-materialization instead of trigger based which means you'll need to account for some inconsistent data (as is the case with caching in general).
SQLite doesn't have materialized views.
It's the best kept secret on the internet. I wish you guys blogged about your engineering, I bet there's lots to say.
So are you actually a paying customer with a Radio Airplay account or are you a free Jango listener? I never had and never needed an account.
I don't know what kind of database Jango uses or what kind of work Jango people do but I sure hope they keep doing it.
We have a table of denormalized data that's used as one of the inputs to an ETL system, and it has to be sorted by one of the columns in the table. This is the table's only purpose. Unfortunately, postgres refuses to use the index because random-access across the entire table is slower than loading and sorting it all, when retrieving all data in the table. This means the ETL system has to wait for about 15 minutes before any data is returned.
Some preliminary tests on a copy of the database indicate that, using CLUSTER to order the data, it will do an index scan instead of a the table scan+sort, and start returning data immediately. As long as the correlation remains high enough with regular updates, I'm thinking maybe this can be a scheduled once-a-week operation...
My goal for now is minimal effort to set up, so it takes minimal maintenance later on, but I'll keep pg_repack in mind in case something weird does happen.
1) A reminder that when you have a foreign key, such as Order table has a UserId column, and UserId has a FK to User -> Id table, there is an implicit lookup that will happen on the Order table if you attempt to delete a User, to validate its deletion wont violate the Order UserId FK.
DELETE FROM User WHERE Id = 1
This will require the database to first check that won't violate a FK on Order.UserId:
-- Implicitly (something like this) is done to validate UserId = 1 is not in use on Order.UserId that has a FK to User.
SELECT Count(*) FROM Order WHERE UserId = 1
This can lead to poor update/delete performance on the User table (if the row was deleted or the User.Id was ever changed-- a convoluted example), unless you remember to place an index on Order.UserId column to help with FK enforcement.
2) If you use SQL Server, and you have a large table, and a few times a month for some reason queries that were working fine yesterday suddenly timeout, but then start to run quickly again for a while, its probably because of SQL Server suddenly deciding to recalculate statistics for its optimizer, and the query that is timing out, is the lucky query chosen to require the recalculation to happen first.
The recalculation usually involves SQL Server scanning the table before proceeding ahead with your query, and if the query times out, its re-computation of stats is rolled back (so it keeps failing, until it eventually gets to complete the query).
A workaround that has worked for us is to enable asynchronous statistics, so that the re-computation won't block the query that happened to trigger the re-computation in the first place:
Regarding the "Always Load Sorted Data" section, it says to insert the data into the table so that the column you want to select on is already sorted, thus having a correlation of 1.
What if I'm adding a single record to the database, how do you re-sort them all upon insert?
In ETL work, for example, it's not uncommon to truncate and reload source and staging tables on each batch run. So the tip would be helpful in those scenarios, more than in an OLTP case (insert this record, update that one).
UPDATE users SET email = lower(email)
UPDATE users all rows SET email = lower(email)
UPDATE users SET email = lower(email) LIMIT COUNT(email)
> Not everyone realizes this, but the COUNT function will only count the records where the expression is NOT NULL in COUNT(expression) . When the expression is a NULL value, it is not included in the COUNT calculations.
SET enable_indexscan TO off;
SET enable_indexonlyscan TO off;
I know it's a new post, but it's funny to see what a large share of this discussion is about your ORM comment. You've touched a nerve, I think! I also wonder if there should be a corollary of Godwin's law -- "as an online discussion about relational databases grows longer, the probability of diverging into a discussion about ORMs approaches 1." :)
You only need to refactor to use raw SQL in the rare edge cases where you need more performance. At least working with RoR I find these cases extremely rare (less than 1% of the cases by my estimate).
use just a query builder that returns a raw array for times when it's just being shipped down in JSON to a client. The overhead of full ORM - converting to full objects internally - often for hundreds or thousands of records - is not worth whatever small conveniences might be afforded via a full ORM usage.
I sometimes attribute the modern rise of NoSQL and other databases on the somewhat bad reputation that infrastructure-only DBA's can give to the profession. If the only major thing you do for work is check the backups ran overnight and deny change requests for "stability" reasons, it can really poison the well for others who want to make the database sing.
So what do developers do? They cut out the DBA role by inventing complicated but relatively low-maintenance DBs that don't need any DBA's at all - if you have performance issues just spin up another node. Whereas all they often needed was a good Development/Application oriented DBA to sort out their traditional RDBMS properly.
I remember one story in particular where the app devs discovered one of the stored procedures could be manipulated to run arbitrary SQL, so they started using that instead of talking to the DBA...
worked in at least 2 different shops with the same restriction: "developers can never touch production database".
actually, it's been that way in most places large enough to have enough staff to separate things out. however, in most of those places, a dev or two would still have access to use, whether for emergencies or debugging or whatnot.
In 2 places, there was an enforced rule, where the DBA (in both cases, one and only one person) had the sole password/credentials for production. "Devs can't ever be on production, they could write random SQL - everything has to be tested". The person who said this would routinely write his own sprocs on the DB - on production only - without tests. he acted as a support person for the owners - he'd just write sprocs and custom query on production only for any reports or data updates owners and upper mgt wanted. But... those sprocs were never documented, tested, or pushed back to dev, so we never knew what was running there. If we asked for a new table, or new index, or whatever, that might cause some trouble for his hidden/untested code, we'd get blanket denials. "that causes a performance bottleneck - won't do".
The "complain about the DBAs" issue is real, but is just symptomatic of bad culture all around. It's just as bad to have any cowboy running rampant without any visibility in to what they're doing. But sometimes (often?) DBA folks seem to escape this scrutiny (perhaps just at smaller shops?)
I'd not heard this title before and it is misleading to me since I strongly associate DBA with "has nothing to do with application code" so it's a bit of a contradiction to me personally.