Hacker News new | past | comments | ask | show | jobs | submit | kroolik's comments login

High level devs still do low expertise things every now and then. Or rather, designs need to be implemented, eventually.

Re whys: this can be even simpler. I sometimes catch myself rapidly click the mouse button a second time with my finger, right after the initial click. This is not intended and may be related to low resistance on the button itself.


Same, but for me it's because as I've gotten older, sometimes my finger gets unsteady enough to double tap something, mostly on touchpads rather than a mouse.


Link requires premium account to read the actual data


You can peer-connect vpcs cross-account


Up to 125 peers before you have to setup a transit vpc which is a lot more complex


Transit gateway can have 5000 VPCs connected in a region, and you have multiple TGWs. And rather than have a VPC per account you can use Shared VPCs instead.


Having an index over the uuid is equivalent to it being a PK, so why would you bother having both?


Because it's much better for range queries and joins. When you inevitably need to take a snapshot of the table or migrate the schema somehow you'll be wishing you had something else other than a UUID as the PK.


This. Highly recommend using a numeric primary key + UUID. Using UUID relations internally can have some strategic advantages, but when UUIDv4 is used as the only primary key, you completely lose the ability to reliably iterate all records across multiple independent queries.

Also, the external thing isn't just for exposing it out to your own apps via APIs, but way more importantly for providing an unmistakable ID to store within external related systems. For example, in your Stripe metadata.

Doing this ensures that ID either exists in your own database or does not, regardless of database rollbacks, database inconsistencies etc. In those situations a numeric ID is a big question mark: Does this record correspond with the external system or was there a reuse of that ID?

I've been burnt taking over poorly managed systems that saved numeric IDs externally, and in trying to heal and migrate that data, ran into tons of problems because of ill-considered rollbacks of the database. At least after I leave the systems I build won't be subtly broken by such bad practices in the future.


Ha? Please elaborate.


When running a batched migration it is important to batch using a strictly monotonic field so that new rows wont get inserted in already processed range


It's not even necessarily it being strictly monotonic. That part does help though as you don't need to skip rows.

For me the bigger thing is the randomness. A uid being random for a given row means the opposite is true; any given index entry points to a completely random heap entry.

When backfilling this leads to massive write amplification. Consider a table with rows taking up 40 bytes, so roughly 200 entries per page. If I backfill 1k rows sorted by the id then under normal circumstances I'd expect to update 6-7 pages which is ~50kiB of heap writes.

Whereas if I do that sort of backfill with a uid then I'd expect to encounter each page on a separate row. That means 1k rows backfilled is going to be around 8MB of writes to the heap.


Isn't that solved because UUIDv7 can be ordered by time?


Yeah pretty much, although ids can still be a little better. The big problem for us is that we need the security of UUIDs not leaking information and so v7 isn't appropriate.

We do use a custom uuid generator that uses the timestamp as a prefix that rotates on a medium term scale. That ensures we get some degree of clustering for records based on insertion time, but you can't go backwards to figure out the actual time. It's still a problem when backfilling and is more about helping with live reads.


Are page misses still a thing in the age of SSDs?


Strictly monotonic fields are quite expensive and the bigserial PK alone won't give you that.


PG bigserial is already strictly monotonic


No they're not, even with a `cache` value of 1. Sequence values are issued at insert rather than commit. A transaction that commits later (which makes all updates visible) can have an earlier value than a previous transaction.

This is problematic if you try to depend on the ordering. Nothing is stopping some batch process that started an hour ago from committing a value 100k lower than where you thought the sequence was at. That's an extreme example but the consideration is the same when dealing with millisecond timeframes.


Okay, but in a live DB, typically you won't have only inserts while migrating, won't you?


Yes, but updates are covered by updated app code


would creation/lastmod timestamps cover this requirement?


Yes, although timestamps may have collisions depending on resolution and traffic, no? Bigserials (at least in PG), are strictly monotonic (with holes).


The locking part is often forgotten when discussing zero downtime migrations in postgres. As more and more operations become fast, more engineers will get bitten by this.

Ive telling this to engineers in my org for such a long time, now I also have a nice article to share :)

BTW, you don't need the pg_sleep and hurry with the execution. Simple BEGIN and COMMIT/ROLLBACK will give you infinite wait time :)


I believe the fact that V8 vulnerabilities are not "classic" memory corruption can be attributed to their developers' experience and review processes.

This doesn't imply, though, that another project in C++ will share these traits.


Its more like "Im interested in subscribing and paying more"


or, "I'm interested in paying less, in exchange of relieving some of your labour that you'd have to do to support my purchase"


As I started to think for some time now: you can have a challenge or a solution.

As engineers, we are often tempted to challenge ourselves, straying away from the latter. There is less perceived pride from following simple solutions.


We use IF NOT EXISTS to bring non prod environments in sync with prod. The size of prod requires some migrations to be done separately over course of days, in separate transactions. The IF NOT EXISTS clause then brings dev and non prod envs in sync.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: