Hacker Newsnew | past | comments | ask | show | jobs | submit | caffeinated_me's commentslogin

Looks like that still has downtime for a Postgres migration- you're suggesting going into maintenance mode and just doing a dump/restore. I've seen that take hours once you hit the terabyte scale, depending on hardware.

I've had pretty good luck setting up logical replication from Heroku to the new provider and having a 10-15 minute maintenance window to catch up once it's in sync. Might be worth considering.

You might also want to add a warning about Postgres versions. There's some old bugs around primary key hash functions that can cause corruption on a migration. I've seen it twice when migrating from Heroku to other vendors.



It sounds like you're doing something similar to how Databricks works now that they've acquired neon, or Snowflake now that they got Crunchy. I'm guessing the local SSD is a big advantage, but what else is different with your approach?

Thanks for posting this question! Compared to Snowflake and Databricks, a few key differences in our approach are:

(a) An initial focus on real-time, customer-facing applications rather than trying to boil the ocean. This also aligns with where the Postgres + ClickHouse combination has really shined for our users. Both Postgres and ClickHouse are designed primarily with developers building their system of record applications.

(b) Every component in the stack is open source—Postgres, ClickHouse, PeerDB for native CDC, pg_clickhouse, and Ubicloud Postgres (our data plane component). We plan to keep it that way as much as possible, as this strongly aligns with our ethos.

(c)Third, as you noted, Postgres is NVMe-backed and the focus is on performance and scalability, while maintaining top-notch reliability. We think that this more meaningful to fast-growing (AI-driven) workloads than instant provisioning and forking. I talk about this a bit more here - https://clickhouse.com/blog/postgres-managed-by-clickhouse#p...


Thanks! Out of curiosity, does the NVME have a big effect on replication throughput? I've been wondering how much trouble I've had with other solutions is due to parsing WAL and how much is just slow cloud disk

Very interesting question. Depends on the use-case, have seen quite a few workloads where logical replication gets throttled on I/O (reorder buffer) where NVMe based disk access should help a lot. This happens specifically when there are larger or interleaved transactions. We plan to test this at production scale soon. Stay tuned for more learnings!

Is it a cost disadvantage for being NVMe-backed ?

Great question! It really depends on the workload. We already support NVMe instances as small as 4 GB RAM / 2 vCPUs. For HA setups, you could go with one standby (with configurable synchronous replication) or two standbys (cross-AZ, with quorum-based replication). So yes, there is some additional cost from a hardware perspective due to the standbys, but depending on the workload, NVMe performance could offset those costs. On top of this, there’s a separate topic around the reliability/availability promises of separating storage and compute for an OLTP Postgres database.

I've linked this elsewhere in thread, but here's testing results from a US Navy pilot project for carbon fiber unmanned subs. It looks like this found it pretty viable.

https://apps.dtic.mil/sti/pdfs/ADA270438.pdf


This isn't the first carbon fiber submarine, although it is the first manned one. The US Navy tried out an unmanned model in the 80s, and got much better results- they were expecting at least 1000 successful dives before stress fatigue was an issue.

Here's a detailed report on it. Pages 32-33 has their take on material analysis, probably the most relevant to this failure

https://apps.dtic.mil/sti/pdfs/ADA270438.pdf

I'm personally more suspicious of oceangates manufacturing process than the material, but I'm far from an expert here.


It’s the unpredictable nature of failure that’s at issue here. For unmanned subs it doesn’t matter if 10% of failures occur well below the expected lifespan but that’s a huge issue for manned subs.


I'm not even sure it's the first manned carbon fiber submersible.

Deepflight Challenger [...] is the first deep-diving sub to be constructed with a pressure hull (central tube portion) of carbon fibre composite, built by Spencer Composites for HOT. Its carbon fiber design would later influence the tube for the sub Titan,[12] which imploded...

https://en.wikipedia.org/wiki/DeepFlight_Challenger


Notable from that page is this paragraph though:

""" Based on testing at high pressure, the DeepFlight Challenger was determined to be suitable only for a single dive, not the repeated uses that had been planned as part of Virgin Oceanic service. As such, in 2014, Virgin Oceanic scrapped plans for the five dives project using the DeepFlight Challenger, as originally conceived, putting plans on hold until more suitable technologies are developed. """


Being manned is a major difference. Humans need a lot of space. Pressure grows with volume, which is cubic, but the hull grows with area. You can also submerge components in oil, which is much better at resisting pressure than air.


Pressure doesn't grow with volume. The exterior design pressure is constant. The stress on the wall scales linearly with the diameter. Making submarines bigger actually makes it easier because the buoyancy scales cubically with the volume while the weight scales linearly with the perimeter, so the larger the submarine the thicker the walls can be.


There’s several kinds of scaling involved, once the radius increases enough thicker walls are less efficient than internal bracing.

It’s impractical to build something like an Ohio class submarine that can reach the bottom of the Mariana Trench when you also want multiple internal compartments in case of damage.


Internal bracing is to resist buckling. You need it as your cylinder gets longer (as military subs tend to be), but has nothing to do with the diameter. It's also good for torsional strength, which is not really a concern for a pressure vessel but is important for a ship that's going to be in the actual ocean. But for just resisting pressure, once your diameter exceeds 20 times the wall thickness, the relationship is linear.

You can get better efficiency with multiple spherical pressure vessels joined together over a cylindrical vessel, as spherical pressure vessels better distribute the loads than cylinders. This is done in some particularly deep diving military subs, which are then surrounded by an unpressurized cylinder for hydrodynamics.


Even with a spherical sub the diameter impacts a lot of things. For example a large sphere sees significantly lower pressure across the side facing the surface than the side facing the sea floor.


At the depth of the titanic to see a 1% variation in pressure between the top most and bottom most points of a sphere, the sphere would need to be 40 meters in diameter. For context, the pressure vessels of the largest submarine in the world have a diameter of 10.9 meters. Note that pressure at a given depth varies due to things like temperature fluctuations, ocean currents, and even variation in Earth's gravity. Further, the walls of pressure vessels distribute the load - any variations of the pressure get averaged out. It's the same principle as a dome - every element of the sphere is pushing against the adjacent elements and resisting being pushed by those adjacent elements. At the size scale where this is no longer the case, you're not building a pressure vessel. If you're making a dam or a hollow column going down into water, or perhaps a massive dome on the ocean floor, you would need different equations. Even for a submarine you may be concerned with things besides pressure resistance, like collision or sea keeping, as previously stated. But from a pressure resistance standpoint the diameter to wall thickness requirement holds equally true for small exploratory subs and the largest military subs.


A sphere is a great shape for dealing with such forces but it’s just a more complicated system. Rotation can cause metal fatigue, openings get more complicated, etc.

> For context, the pressure vessels of the largest submarine in the world have a diameter of 10.9 meters.

Few subs can reach the titanic at 12,500 feet, at more common crush depths and especially non spherical geometries it’s very much worth considering. Subs often dive and surface at a significant angle.


Good find! I've seen similar behavior before and was wondering why it wasn't easy to stop.

This isn't the only place Postgres can act like this, though. I've seen similar behavior when a foreign data wrapper times out or loses connection, and had to resort to either using kill -9 or attaching to the process using a debugger and closing the socket, which oddly enough also worked.

Might be worth generalizing this approach to also handle that kind of failure


I'd argue that horizontally sharded databases can work well, but they do tend to have significant non obvious tradeoffs that can be pretty painful.

There's a handful of companies that have scaled Citus past 1PB for production usage, but the examples I'm aware of all had more engineering to avoid capability or architecture limitations than one might like. I'd love to see someone come back with a fresh approach that covered more use cases effectively.

Disclaimer: former Citus employee


I can imagine it for some constrained use case, but taking your typical RDBMS that's powering a variety of business logic with complex queries, I dunno.

One interesting tradeoff Postgres and MySQL made for efficiency's sake was making xacts not fully ACID by default; instead they guarantee something that's good enough as long as you keep it in mind. Cockroach and Spanner are fully ACID, but that means even if you used those as a single-node DB, it ought to be slower.


It can be great, depending on your schema and planned growth. Questions I'd be asking in your shoes:

1. Does the schema have an obvious column to use for distribution? You'll probably want to fit one of the 2 following cases, but these aren't exclusive:

    1a. A use case where most traffic is scoped to a subset of data. (e.g. a multitenant system). This is the easiest use case- just make sure most of your queries contain the column (most likely tenant ID or equivalent), and partially denormalize to have it in tables where it's implicit to make your life easier. Do not use a timestamp. 

    1b. A rollup/analytics based use case that needs heavy parallelism (e.g. a large IoT system where you want to do analytics across a fleet). For this, you're looking for a column that has high cardinality witout too many major hot spots- in the IoT use case mentioned, this would probably be a device ID or similar
2. Are you sure you're going to grow to the scale where you need Citus? Depending on workload, it's not too hard to have a 20TB single-server PG database, and that's more than enough for a lot of companies these days.

3. When do you want to migrate? Logical replication in should work these days (haven't tested myself), but the higher the update rate and larger the database, the more painful this gets. There's not a lot of tools that are very useful for the more difficult scenarios here, but the landscape has changed since I've last had to do this

4. Do you want to run this yourself? Azure does offer a managed service, and Crunchy offers Citus on any cloud, so you have options.

5. If you're running this yourself, how are you managing HA? pg_auto_failover has some Citus support, but can be a bit tricky to get started with.

I did get my Citus cluster over 1 PB at my previous job, and that's not the biggest out out there, so there's definitely room to scale, but the migration can be tricky.

Disclaimer: former Citus employee


Depends on your schema, really. The hard part is choosing a distribution key to use for sharding- if you've got something like tenant ID that's in most of your queries and big tables, it's pretty easy, but can be a pain otherwise.


Same pain as with good old (native) partitioning, right? :)

As with partitioning, in my experience something like a common key (identifying data sets), tenant id and/or partial date (yyyy-mm) work pretty great


For a multi-tenant use case, yeah, pretty close to thinking about partitioning.

For other use cases, there can be big gains from cross-shard queries that you can't really match with partitioning, but that's super use case dependent and not a guaranteed result.


Seems like this is a similar philosophy, but is missing a bunch of things the Citus coordinator provides. From the article, I'm guessing Citus is better at cross-shard queries, SQL support, central management of workers, keeping schemas in sync, and keeping small join tables in sync across the fleet, and provides a single point of ingestion.

That being said, this does seem to handle replicas better than Citus ever really did, and most of the features it's lacking aren't relevant for the sort of multitenant use case this blog is describing, so it's not a bad tradeoff. This also avoids the coordinator as a central point of failure for both outages and connection count limitations, but we never really saw those be a problem often in practice.


We certainly have a way to go to support all cross-shard use cases, especially complex aggregates (like percentiles). In OLTP, where PgDog will focus on first, it's good to have a sharding key and a single shard in mind, 99% of the time. The 1% will be divided between easy things we already support, like sorting, and slightly more complex things like aggregates (avg, count, max, min, etc.), which are on the roadmap.

For everything else, and until we cover what's left, postgres_fdw can be a fallback. It actually works pretty well.


Any recommendations on telehealth suppliers to contact for that compounded formulation? They're easy to find, but I'm not sure who is trustworthy on this topic.


Mochi / Henry Meds. Mochi is the cheapest.


I just went through the quiz at Mochi and it said I was eligible for their nutrition program but not medication. The FAQ says your BMI has to be over 30 or 27 if you have some other health condition.


Take my advice at your own risk, but nobody is checking your math.

I was 10 pounds or so from qualifying, so I fudged my numbers a bit. Didn't make sense to force myself to gain weight so I could lose weight.

Places like OrderlyMeds doesn't even require a telehealth visit, just the questionnaire and a photo.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: