Sidenote: Literally any button press you could reasonably expect to move to another slide works. PgDown, Right, Down, Space. Stop shitting up a decent submission to whine about it.
In Postgres, UUIDs are stored in 128 bits, so it's hardly any overhead in almost all use cases, so if you can save yourself a day or two, or three, or a week of work just by switching your mental default from an integer to a UUID, that's a pretty big win.
1. Offline / sync - you can have out-of-sync or offline clients (e.g mobile apps) inserting new data without worrying about pk mismatch or collisions.
2. security - it's obviously a bit of security through obscurity, but not being able to guess the key of an object can be very useful some time. It shouldn't be relied on as a first line of defence, but it gives you a naturally impossible-to-guess reference. If you want to avoid someone crawling through your image-base or any object that is generally accessible, but you want to avoid having it accessible in-bulk, UUID is your friend.
3. Silly mistakes - If you pass an object id, you really want to know it can only be for one object. Maybe the front-end is sending an id of one object, but the backend is working on another (shit happens). When you use sequential ids, you increase the chance of weird stuff going on. With UUIDs you simply can't use the wrong id, even by mistake.
Of course you can also use a UUID that's less random, e.g. something with an incrementing first part and a small random last part.
If you want incrementing UUIDs, use v1 UUIDs, which are based MAC address + a timestamp. These change in a predictable way and are more indexable.
This is what happens when I try to sound cleverer than I actually am. :(
So, your system would not be memory bandwidtht constrained, not because it has many threads using roughly the same memory blocks, but because it has one thread at a time working on those memory blocks.
What they ended up with was pretty much UUID but for a more limited deployment: the real issue is just that UUID is extreme overkill, designed to allow every individual user to generate unique identifiers (something that sounds cool, but in practice is worthless as you can't trust the clients).
Assuming UUID v1 right?
The way to think about the problem is that at the granularity of your timestamp you lose the ability to uniquely generate identifiers across multiple nodes (on the single node, this is handled with the sequence number), so you need some kind of identifier for the instance of the generator itself: with UUIDs, they spend a relatively immense number of bits storing the computer's MAC, but if you know you know you only have a smaller-scale deployment and are able/willing to centrally plan identifiers for the deployed nodes, you can get buy with something akin to this "shard ID".
(BTW, note that a v1 UUID need not be generated with a true MAC: the specification both notes that you can just by node addresses from them for a fairly small cost, or to use fake addresses that are marked as such by setting the multicast bit. The result is that if you want to use UUID v1 off-the-shelf with "shard IDs" you are more than welcome to do so, not just in the "analogously-equivalent" sense but in the "to the letter of the spec" sense. You can find more information in Section 4.5 of the UUID specification.)
One less moving part at the expense of wedding yourself to a prepartitioned logical shard scheme, I guess. (I wonder how painful it would be to rebucket data into a different logical shard should the need arise...)
We use something very similar to what Instagram does with Postgres : http://instagram-engineering.tumblr.com/post/10853187575/sha...
This allows far greater flexibility with regard to sharding and scale. Bigint with a function generated nextval will sustain up to 9223372036854775807. If you need more than that, your next option is Numeric, but I don't think we'll exhaust bigint any time soon.
If you use UUIDs, this is easy. If you used SERIAL, you now need a scheme to intelligently merge records. Not hard when everything is in a single table. A bit tougher when you have a wholly or partly normalised schema, because now you need to come up with correct renumbering schemes for every single table which has another table holding a foreign key into it.
I can tell you from personal experience that this is not fun.
v1 UUIDs are "guaranteed" to be unique, modulo people fiddling with your MAC addresses. v4 UUIDs are random keys and the odds of collision are so infinitesimal that it's worthless to fuss about it. v4s also have the nice property that they can double as nonces, but the sucky problem that they are not that great for indexes.
Then there's stuff like replaying logs reliably on multiple machines, blah blah blah. Every instance of 186360 is indistinguishable from every other instance of 186360. So you will need additional logic to keep them straight.
But take this uuid:
The odds that this has been created anywhere else, at any point, is as good as zero. You're more likely to have an integer key collision due to cosmic radiation flipping some bits.
So no special merging / field-source tracking logic is required.
We aren't using serial, but instead using functions to generate our own nextvals with the server IDs, epoch etc..., kinda the same way as Instagram (though, obviously our databases are no where near the same size). It came down to how much logic we were comfortable implementing in the schema from the start to avoid continuing complexity in the applications themselves.
Don't do it, you only make it worse for yourself if you pick a natural key for a PK!
Starting in 1986, the team published a number of papers describing the basis of the system, and by 1988 had a prototype version. The team released version 1 to a small number of users in June 1989
This is a pretty big caveat and one of the rare areas in which postgres does worse than other database systems. In SQL Server a non-recursive common table expression is treated by the optimizer similar to a macro - You can break up a complicated query with unions and group bys into easier to read cte:s and be confident that the optimizer will piece them all together into a query whose execution plan is equal to the original one.
But with postgres you can't because of the optimization boundaries. Its optimizer will attempt to make each cte as individually efficient as possible which can lead to a much worse execution plan overall. You can use views instead which, in contrast to cte:s, postgres optimizer can see through.
But you are right, when using CTEs for readability it tends to hurt more than help.
Interesting thread about being able to control this:
> A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)
In other words, something like:
with x as (select * from users) select * from x limit 1
select * from users limit 1
with x as (
select * from users
join images on images.id = users.image_id
select * from x limit 1
select * from users
join images on images.id = users.image_id
In other words, the planner pretty much considers the "WITH" subquery separately from the outer query, and plans them separately.
edit: couldn't get the keyboard to pop up on Android, but putting it into landscape mode worked swiping slides (on the stock browser with Android 4.2 [not Chrome]).
The biggest killers (vs. postgres): no simple helpers (generate_series, for example, is achievable with simple, efficient recursion, but you have to write the whole statement); a truly antiquated client program (sqlplus); built-in json, but you do have associative arrays/hash maps and nested tables, so you can achieve exactly the same thing, and it's indexable*; and range types (meh imo).
Basically WITH-AS lets you declared temporary named subqueries. That is, a pretty stock standard programming technique -- give a name to a fiddly expression to simplify subsequent code. Very useful.
Oracle has lots of annoyances though. No SERIAL or IDENTITY type. Having to write the same boilerplate sequence/insert trigger/update trigger over and over and over is very tedious and a recipe for subtle errors when you forget one of them or mess it up.
No boolean type in SQL. I don't even know what to say about that.
Tom ties himself into unbelievable knots trying to defend this. It's the purest form of technical Stockholm Syndrome I've ever seen from a putatively intelligent person.
Time intervals are also in the SQL standard as I recall (although not in the format listed here - postgres supports standard syntax too though). Not sure how fully supported it is cross-database, though.
Failed to load resource: the server responded with a status of 500 (Internal Server Error) http://postgres-bits.herokuapp.com/js/sh_lang/sh_bash.min.js
Uncaught HTTP error: status 500 sh_main.min.js:4
CREATE TABLE t (
uuid uuid PRIMARY KEY
Given a table T, it will:
1. Create a new empty table T'.
2. Add a trigger on T that will clone any operations to T'.
3. Copy all rows from T to T' in clustering order.
4. Atomically switch the tables so that T' becomes T.
Unlike CLUSTER, the only lock required is in step 4, when it needs an exclusive lock on the table; it cannot do the switch until all current transactions are done with T (you can specify a timeout). The only other drawback is that that it does not support TRUNCATE, CREATE INDEX, or some ALTER TABLE operations being run while it's busy in step 3.
Also, unlike PostgreSQL's own CLUSTER command, it does not need an index to order by.
We run pg_repack regularly at night to keep our databases clustered. It works really well, though I really wish Postgres could get native online clustering. SQL Server has had automatic clustering for many years, as has DB2, and Oracle has index-ordered tables.
Tell the community - pg_reorg could be cleaned up, improved, and merged into core... and should be!
pg_repack is a fork of pg_reorg (which has not been maintained since 2011), by the way, which has become the de facto replacement. No idea if the code is in need of cleanup or not, though.
That having been said, many data sets work with UUID just fine and the fact that such sets are easier to catenate and compare is very convenient.
Unless you actually need some sequential semantics to your records or are redlining a system where the locality of the index really helps, I prefer the non-coordinated assignment of IDs overall, as pvh points out in his slides.
Also, as other posters mention, uuid has sequential forms that can obviate some of this and reduce it to a 128-bit number...but the common form of a good-quality random one is really quite good enough until the indexes really start to get quite big, and this path is most useful if one remains disinterested in partitioning, which may be ill advised. Having to renumber (aka 'renaming', a form of one of the "two hard problems" in computer science...) everything is a big barrier to partitioning, so if one can afford to avoid renumbering, I think avoiding it is a good idea.
Do NOT use any slide deck framework that doesn't have a solid export function - jpgs, pdfs, whatever.
The videographer who will be editing your video will not be able to deal with your esoteric format - and screengrabbing every slide is horrible busy work.
18 years, and still, nobodys implemented this yet?
Its not a total deal breaker, but jeebus christopher columbus christ. It certainly would be nice.
Its just nice though, if you often find yourself working on raw tables using one of the many Admin tools. Sometimes you just need it to work like Excel to be productive. MySQL makes all of that trivial. PostgreSQL obviously keeps track of the order you created the columns, so being able to edit this ordering seems trivial in my mind.
It just seems odd that there certainly is some demand there for it, yet nobodys picked up the slack in 18 years. Obviously there are workarounds. Why dont they include the work arounds as scripts with the distrobution, etc?
Maybe I will try to implement something and submit a patch. No experience in database dev though :(
Like I said though, its not a deal breaker. I'm just used to Open Source projects over-implementing features. Not under-implementing them. Especially after 18 years.
In relational algebra the ordering of the rows and the ordering of the columns are undefined. It's up to the database to arrange it in the best way possible. The ordering of rows and columns is specified when queried.
> Its just nice though, if you often find yourself working on raw tables using one of the many Admin tools. Sometimes you just need it to work like Excel to be productive.
Anyone have any good suggestions for web frontend to PostgreSQL that supports reordering of columns as easily as MySQL? phpPgAdmin doesnt seem to compare to phpMyAdmin.
Admittedly I probably should be using a more robust OS native PostgreSQL admin tool, and not a web fronted. But alas, all of this is solved with MySQL quite nicely. I would like to see PostgreSQL catch up.
Most databases are, by default, row stores. Moving columns around on disk kinda sucks in such situations, and there's no need to when you can specify results in any order by naming fields.
You name fields in your queries, right? Right?
I'll certainly try, if I thought they'd ever accept a patch from a total newb.
PostgreSQL obviously keeps track of the column order, as far as the order you created them in goes. So being able to edit this order doesnt seem like that tall of an order. Would be a pretty convenient feature that many other DB's do infact have.
Its just a nicety that PostgreSQL should have, if its hoping to win over people who are familiar with MySQL, which does sorta seem like its goal these days.
This always means more complexity and a performance overhead and frankly provides next to zero value because columns are named in queries anyways (or the order becomes meaningless when joining)
There is, however, a lot to be said for close coupling between query planning, on-disk format, indexing and so forth.
edit: as for patching, the PostgreSQL source is some of the best I've ever read. Carefully organised, fastidiously documented, immaculately consistent to coding standards. It's a delight.
Does any major RDBMS support this?
Do the math, that's, cumulatively, probably an hour or two of life wasted perhaps more.
I expect webpages to have a mouse based interface, not a keyboard based interface and I'm not alone. It's not churlish of us to complain about our wasted time.
Interesting! And, as a postgresql lover, thanks for posting.
I absolutely learned a ton from it ... I hope other Postgres fans didn't just walk away.
Reality is more complex than any model.
That's why we build models. They are, by their very nature, simpler than reality.
Everything is in flux, and you can choose to model that or not model it.
At the first level, you just model the current state of the system.
Then you begin to model the events that change the current state.
Then you begin to model changes in the shape of the problem domain.
Models all the way down ...