Hacker Newsnew | past | comments | ask | show | jobs | submit | levkk's commentslogin

As the tool gets better, people trust it more. It's like Tesla's self-driving: "almost" works, and that's good enough for people to take their hands off the wheel, for better or for worse.

The "almost" part of automation is the issue + the marketing attached to it of course, to make it a product people want to buy. This is the expected outcome and is already priced in.


Exactly, Waymo were talking about this a few year back, they found that building it up gradually will not work, because people would stop paying attention when it's "almost" there, until it isn't and it crashes. So they set out on having their automation good enough to operate on its own without a human driver before starting to deploy it.

I would say the opposite here. The perpetrator has rejected multiple Claude's warnings about bad consequences, and multiple Claude's suggestions to act in safer ways. It reminds me of an impatient boss who demands that an engineer stopped all this nonsense talk about safety, and just did the damn thing quick and dirty.

Those guys who blew up the Chernobyl NPP also had to deliberately disable multiple safety check systems which would have prevented the catastrophe. Well, you get what you ask for.


I view it more as "I crashed my car, I should have been wearing my seat belt, wear yours!"

Source: had codex delete my entire project folder including .git. Thankfully I had a backup.


All queries run inside transactions, and a slow lane like S3 will cause delays, which will in turn block vacuum and cause more problems than it will solve. Most deployments of Postgres (e.g., RDS) won't let you install custom extensions either, although they do have their own S3 extension (which I wouldn't recommend you use).

The right place to manage this is either in the app or in a proxy, before the data touches Postgres.


See prepared statements.

Models don't learn. They retrain them periodically, but junior engineers learn much faster and constantly improve. If you stop learning, you will only be as good as the model.

I've been coding (software engineering, I guess) for close to 15 years. The models skill set is a comfortable L1 (intern), pushing L2 (junior). They are getting better, but at a snail pace compared to a human learning the same thing.


This was my biggest frustration with LLM based coding but Agent Skills have largely solved it.

While there’s a lot of room to improve them it’s a huge game changer for effectively coding harnesses.


You can I believe. We only support BIGINT, VARCHAR and UUID for sharding, but all other data types are completely fine for passthrough, i.e. to be included and used in your queries.

General statement about adoption. Last time we made a Show HN (9 months ago), it was a POC, running on my local. Now we're used in production by some pretty big companies, which is exciting!

Technically yes. We only support BIGINT (and all other integers), VARCHAR and UUID for sharding keys, but we'll happily pass through any other data. If we need to process it, we'll need to parse it. To be clear: you can include PostGIS data in all queries, as long as we don't need it for sharding.

It's not too difficult to add sharding on it if we wanted to. For example, we added support for pgvector a while back (L2/IVFlat-based sharding), so we can add any other data type, e.g., POLYGON for sharding on ST_Intersects, or for aggregates.


A couple options come to mind:

1. Replicate shards into one beefy database and use that. Replication is cheaper than individual statements, so this can work for a while. The sink can be Postgres or another database like Clickhouse. At Instacart, we used Snowflake, with an in-house CDC pipeline. It worked well, but Snowflake was only usable for offline analytics, like BI / batch ML, and quite expensive. We'll add support for this eventually; we're getting pretty good at managing logical replication, including DDL changes.

2. Use the shards themselves and build a decent query engine on top. This is the Citus way and we know it's possible. Some queries could be expensive, but that's expected and can be solved with more compute.

In our architecture, shards going down for maintenance is an incident-level event, so we expect those to be up at all times, and failover to a standby if there is an issue. These days, most maintenance tasks can be done online in-place, or with blue/green, which we'll support as well. Zero downtime is the name of the game.


I would say, over 100 Postgres connections, consider getting a connection pooler. Requests per second is highly variable. Postgres can serve a lot of them, as long as you keep the number of server connections low - that's what the pooler is for.

You can use pgbench to benchmark this on local pretty easily. The TPS curve will be interesting. At first, the connection pooler will cause a decrease and as you add more and more clients (-c parameter), you should see increasing benefits.

Ultimately, you add connection poolers when you don't have any other option: you have hundreds of app containers with dozens of connections each and Postgres can't handle it anymore, so it's a necessity really.

Load balancing becomes useful when you start adding read replicas. Sharding is necessary when you're approaching the vertical limit of your cloud provider (on the biggest instance or close).


Okay, on my side I have a server for my API, using Drizzle, I guess it already uses some kind of pooling (or at least it asks me to instantiate a pg.Pool, not sure if that's a lightweight connection pooler on the server side), and only a couple of workers with a Drizzle pool each, so I guess I'm far enough from that limit

Do connection increase mostly as you increase microservices / workers, or is it something more related to how many endusers of your service (eg connections on your webserver) you have?


That's exactly right, it's both of those. More containers / services means more connections to the DB, which themselves need to be pooled. More requests to the app require more connections as well.


The current behavior unfortunately is to just let it through and return an incorrect result. We are adding more checks here and rely heavily on early adopters to have a decent test suite before launching their apps to prod.

That being said, we do have this [1]:

    [general]
    expanded_explain = true

This will modify the output of EXPLAIN queries to return routing decisions made by PgDog. If you see that your query is "direct-to-shard", i.e. goes to only one shard, you can be certain that it'll work as expected. These queries will talk to only one database and don't require us to manipulate the result or assemble results from multiple shards.

For cross-shard queries, you'll need your own integration tests, for now. We'll add checks here shortly. We have a decent CI suite as well, but it doesn't cover everything. Every time we look at that part of the code, we just end up adding more features, like the recent support for LIMIT x OFFSET y (PgDog rewrites it to LIMIT x + y and applies the offset calculation in memory).

We'll get there.

[1]: https://docs.pgdog.dev/features/sharding/explain/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: