Hacker News new | past | comments | ask | show | jobs | submit login
AlloyDB Omni – run AlloyDB anywhere (cloud.google.com)
144 points by forrestbrazeal on March 29, 2023 | hide | past | favorite | 78 comments



Google Cloud gets a lot of heat because of Google's legacy of killing things off (the exact opposite of what enterprise decision makers want when selecting a cloud platform).

But I have to applaud Google for the excellent first party emulators and local tooling that they provide (Alloy Omni doesn't seem like an emulator on first readthrough, but enabling fast local iteration has the same effect). The Firebase emulator suite makes development soooo fast because it cuts out a deployment step to a shared resource that has to be coordinated.

Meanwhile, Microsoft as been dragging their feet on supporting CosmosDB emulation on Arm hardware [0]. I was a big fan of CosmosDB, but kind of gave up on it after switching to an M1 MBP because it was unwieldy to work with without a local emulator.

[0] https://github.com/Azure/azure-cosmos-db-emulator-docker/iss...


Definitely not an emulator. It's fully AlloyDB...in a container. We'll be working moving forward to smooth out the install process (tech preview, focus was on getting it in y'alls hands). In my mind, it's a proper Postgres drop-in replacement (Yes, of course I'm biased because I work at Google) but the tech is legit.


> it's a proper Postgres drop-in replacement

The only downside I see is that going through the deployment process, it seems that the smallest instance I can provision is 2vCPU/16GB which if I'm looking at the pricing table correctly, equates to $230/mo. for a single instance.

The web console only lets me create a cluster with minimum 2 instances so it seems like I'm looking at $460 minimum to deploy Alloy. That feels a bit excessive.


You're looking at Cloud AlloyDB sounds like? If you deploy AlloyDB to GCE (instead of locally, etc) you can provision a smaller instance than that. 2cpu(shared core)/4GB memory with a 20GB boot disk (AlloyDB takes up ~13) would run you $26/month before network/storage costs.


Just my $0.02 here: if Cloud AlloyDB can deploy a big cluster, it seems like such a small stretch to take the same pipeline and deploy a small instance instead.


Yah, I know. The Cloud version has a lot of stuff happening behind the scenes that makes it a bit trickier to have a smaller machine configuration. It's being worked on, but I can't speak to timelines/configurations.


I’ve gotta say without unambiguous and long term unencumbered licensing, preferably with open source attached, I can’t foresee recommending any place I’ve got influence at to even look at it yet. Hopefully you can push them in a direction that is plausibly usable by anyone other than hobbyists. Even as a local dev tool it leaves too many holes open in the licensing and distribution model. Letting a team build a process and workflow around a time bomb would be negligence. It’s sad to characterize this as such, but we’ve all been burned too many times by globo mega corp whims.


100% hear you. I realize this may sound hollow coming from a Google employee, but stay tuned on specifics around licensing and costs. Being a tech preview, there's been virtually no discussion on specifics yet. We just wanted to get the tech in folks' hands. We're pushing hard for a free tier of usage (and the product team isn't pushing back currently).


Not hollow! Good luck.


Fair but maybe drop the open source tag on the blog post, since it’s not?


At least there is a Cosmos emulator... There has been a ticket open requesting an Azure Service Bus emulator for FIVE YEARS lolsob https://github.com/Azure/azure-service-bus/issues/223

Which is too bad, because ASB is awesome, IMO the best Azure service.


Especially head scratching given that Google's Pub/Sub has an emulator as does SNS/SQS via LocalStack.

I am also a big fan of ASB and I cannot fathom how a multi-billion dollar company that bills itself on developer productivity can't ship emulators for their cloud suite.

If even for their own internal testing and release validation.


Really, then again the bar is low in Azure.


Right now it looks just like an emulator - not based on the description, but based on it being only available right now as a free developer edition "not suitable for production use". Wondering how pricing will work for the real deal, and also whether it'll be horizontally scalable like their cloud based solutions.


> 3. Use of the Software.

> 3.1. Use. You may authorize employees, agents, and subcontractors to use the Software in accordance with this Section 3, so long as you remain responsible for them. You may make a reasonable number of copies of the Software for back-up and archival purposes.

> You acknowledge that the Software is a preview offering not intended for production environments, and you agree that you will only use the Software in non-production environments.

Anyway, all that stuff in the docs is lovely, but if you just want to have a look:

    pip install gsutil
    gsutil cp -r gs://alloydb-omni-install/$(gsutil cat gs://alloydb-omni-install/latest) .
The install scripts are only 16K, look at `installer/scripts/start_alloydb.sh` for more, but basically it just runs the two docker containers listed in https://cloud.google.com/alloydb/docs/omni/install#install

Seems kind of weird, having a one-time install script to prep a machine (but only a specific type of machine!) that you then run a pair of docker containers on to me, honestly. Eventually consistent deployment states? eh. whatever...


For those who want to skip the gsutil part, that bucket and its keys are public:

    curl -vJLO https://storage.googleapis.com/alloydb-omni-install/$(curl -fs https://storage.googleapis.com/alloydb-omni-install/latest)alloydb_omni_installer.tar.gz
The `latest)alloy` is not a typo, the contents of the "latest" file ends with a slash and GCS doesn't tolerate `//` (presumably just like S3 wouldn't)


Has Google contributed some of these changes back to Postgres? I went looking and even a search for "alloydb" in some postgres mailing lists return no results. It has some very exciting features, I was a bit surprised to see nobody talking about it.


Not as of yet, no. I've gently nudged the eng/product teams in this direction, but for now it's all being kept in-house.


If I understand correctly, AlloyDB is a Postgres compatible re-implementation, so there's no shared code. So this is a bit like Wine compared to Windows, only with the big corp re-implementing the OSS software instead of the other way round.


OMG no, we didn't reimplement Postgres. Definitely started from the PG kernel and made modifications from there.


I often reference Cunningham‘s Law, and I’m glad it’s in full effect :)

Thank you for the clarification!


You can't really be 100% postgres compatible without borrowing a lot of implementation


Query and storage layer are nicely separated in the Postgres codebase, you can absolutely rip out the complete backend if you really want it.

There is a Postgres interface to Spanner which probably did that.


It's worrying that a lot of companies aren't pushing these types of changes back. Particularly with Google, they'll have software patents that prevent others from implementing the same improvements but also, based on their track record, will abandon this project within a few years. When that happens, their improvements will simply be inaccessible.


I realize this may sound hollow coming from a Google employee...but I am fighting to be sure something like this stops happening (in particular the deprecating causing loss of functionality). Google is (a little) better than it has been in the past. E.g. the opening of the protocols on the Stadia controllers when Stadia was shut down so their controllers could work on other platforms.

I am pushing at the OSS angle, but new product, so uphill battle. :)


That’s incredible. I wasn’t expecting them to allow you to run AlloyDB on premise but this is a potential game changer for on-prem Postgres instances, especially if you are doing analytical queries on the same data set.


That's how I've been thinking/talking about it. It should be a drop-in replacement for Postgres. Obviously the install process isn't as smooth yet, but, technical preview...


Eh, there are other options for this that are more mature and more likely to be supported in the long run. The Citus columnar store extension has been around for almost a decade.


I looked at the citus columnar however lack of updates and deletes even occasionally kept me away.


Redrock Postgres - Runing Cloud Native Databse Anywhere

Network Attached Tablespace

https://doc.rockdata.net/admin/network-tablespace/

PostgreSQL Multitenant

https://doc.rockdata.net/features/multitenant/


A quick demo:

AlloyDB Omni Columnar Engine Fast Analytics Demo - YouTube https://www.youtube.com/watch?v=f_dvdKMq6og


I see, not open source, ambiguous long term licensing, or even a commitment to a forever free.

I did notice the release reads just like an internal AWS PR with Andy’s preferred structure… guess all those aws folks they recruited are making an impact.


I wonder how they make transactional workloads 2x faster vs normal Postgres

The analytics workloads improvements seem pretty straightforward (or at least there is prior art like timescale)


> I wonder how they make transactional workloads 2x faster vs normal Postgres

There can be improvements like what OrioleDB is trying to do: https://github.com/orioledb/orioledb/


My best guess is that transactional workloads are mostly improved by automatically adding the right indexes and parameter tuning. For example, OtterTune also advertises "2x on price/performance over unoptimized DBs".


> AlloyDB Omni provides full compatibility with PostgreSQL extensions


Makes sense since it's not in their managed infra


Tried running it on the cloud dev environments I usually use (as I am on Windows myself).

Gitpod:

  2023-03-29 18:13:54.044 UTC: [alloydb_util.sh:90] FATAL: Docker service must be active to run AlloyDB Omni
GitHub Codespaces (after increasing memory):

  2023-03-29 18:37:55.236 UTC: [alloydb_util.sh:76] FATAL: AlloyDB Omni requires cgroups V2 to run.
GitHub Actions:

  2023-03-29 18:53:52.766 UTC: [alloydb_util.sh:44] AlloyDB requires at least 16GB of RAM to run. Only 7 GB available. Please increase available RAM and retry
Not today it seems. Shame, sounded super interesting.


Yeah, I know. Apologies. Stay tuned though! For the tech preview launch we really focused on a golden path (narrow). We'll be expanding compatibility as we march towards GA.


> The free developer edition of AlloyDB Omni is currently available as a technology preview. Check back for full product pricing information.

Thanks, but no thanks!


This looks really sharp. I wonder if Alloy gives some of the data compression benefits of columnar databases with that hybrid approach?


The columnar form is only in-memory and derived from the row-oriented form on disk. Like a denormalized cache of sorts.


Hugely disappointed by Alloydb. Price and performance for a write heavy workload where off the charts horrible. It didn’t even support disabling an instance and it did only storage autoscaling much worse overall experience than using Aurora we moved back to cloudsql.

Does someone have good experience with Alloydb?


Just 2c about disabling an instance - you can delete all the instances of AlloyDB on a cluster for a time being and recreate after. The data will be safe - they are on a cluster level


I have many thoughts here...but let's start with: Do you have your process of how you tested performance? My guess is the eng team would love to look at what you tested.


As a general feedback I have written it down here before: https://news.ycombinator.com/item?id=34304376

Sorry for being a bit negative but I was very excited for AlloyDB as a potential serverless offering for Postgres. Especially after having good experience with Aurora. And we ended up wasting a decent amount of resources migrating to and then away from it, thus my frustration.

In our use-case we use Postgres essentially as a cache. We have many cloud runs (approx 100) writing a lot of data in parallel which we then on a schedule query in certain ways and put it to GCP. Its quite a bit of data, about 1 TB a day.

It was a very CPU bound effort and to get acceptable performance we had to rely on the 8 core configuration, and we couldn't reduce the memory (from 64). We were getting similar performance with a 4 core 8gb Cloud Sql instance.

Probably not the most representitive workload but this had exactly the opposite effect to what we wanted (e.g. much more wasted resources and less serverless).

Also the fact you couldn't disable it was an absolute joke (we have prod dev and staging env and if we have to have a big always running DB for each of them, you can see how that is unacceptable).

We are happy with CloudSql though.


No worries on the negativity! I totally understand. No one likes to have time wasted. :( I'm getting the more detailed info on the cache use-case to the eng team just so they're aware. Also, the can't disable it...so funny story. You can delete the cluster...save the $$$ on the AlloyDB side, and recreate it later and the data's still safely there. So the disable instance is 100% valid feedback (and has been raised before, I'll add your voice to it as well). And there's a workaround....sorta.


Just curious, have you tried Alloy with the PostGIS extension?

I've been trying to get my work let us use BigQuery for our geospatial vector data queries, because we've managed to break all the equivalent AWS products, Snowflake, and Databricks on our dataset. Regular Postgres + PostGIS takes about 30 minutes to run our query and BigQuery does the same in 4 seconds. Unfortunately, BigQuery is a bit of a pipe dream right now for us, so it would be interesting to understand if PostGIS benefits from some of the changes in Alloy.


The improvements we've made on the read side of things likely won't affect the geo data directly, BUT, depending on what other aggregate data you're combining with the geo data you might see some improvement from the columnar engine. Hard to know without digging in deeper on the queries themselves and schemas you're working with. I don't want to steer you down a path of moving a ton of data and infra just to look. Best guess is that it'll be a bit better, but likely closer to your PG experience vs. the BQ experience.


Really appreciate the reply! Everything you said makes sense. It does feel like it would be really hard to beat BigQuery at this kind of task. The main query we run is just comparing two tables geometry where one is a polygon/multipolygon column and the other is a column of points. We ask, are any of these points in any of these polygons? So it's N x M comparisons, pretty much worst case scenario and there's no aggregation. I've had a hell of time trying to optimize PG for just running this one query. On the bright side, I've gotten to learn about and try a bunch of different databases now though :)


My PM also reminded me, depending on the data set size too, if it's bigger than the buffer cache, we might see some improvements over PG as well from that (we've made memory improvements around that which might help potentially).


thank you for the measured response rather than immediately becoming defensive :)


WHAT DO YOU MEAN?! RAAAAAGE!! ;)

I mean, if someone legitimately finds workloads that aren't performing well, our engineering team SHOULD want to know about it, right?


Do they?


Okay, I can't speak for ALL engineering teams, but at the very least I know THIS engineering team absolutely does.


I haven't tried AlloyDB yet but I've found most of GCP is this way. I've had to perform several Cloud SQL db migrations this year because storage will auto increase (due to ephemerally high tmp_data) and cannot decrease.

Lots of other rough edges with their other services. I have to believe that Google doesn't dog-food their own services.


All clouds do that for databases


Maybe I'm old.. but everytime I read Alloy I think of the old modeling language out of MIT to prove a program's correctness. Gives me PTSD thinking of grad school.

https://alloytools.org/


Why would anyone use this as opposed to using Postgres? The value prop of run-anywhere applies to Postgres as well. I see column store and index advisor as the two features but if I don't need these, is there any reason?


It's twice as fast as out of the box PG for most things, and up to 100x faster for reads, depending on what you're doing. So there's that.

Also, from a manageability, on top of the index advisor, there's also vacuum management, so it will figure out when the best time to do the garbage cleanup while minimizing impact on performance.


The 100x analytics seems due to columnar storage like with TimescaleDB, but how do you get improvements on the other things? Is it really faster on a database with good indexes?


It really is, yes. I can't go into a lot of detail on the "why" because it's not open source and the product team would murder me...but I highly encourage folks to try it for themselves. Nothing else convinced me until I did it for myself. :)


We use Postgres as a Datawarehouse, once AlloyDB Omni is stable there is virtually no reason for us not to use it.

Obviously, we use Postgres not for its performance but for its ergonomics, first class dbt support and unbeatable extension ecosystem. Now if you're telling, I get all that with no compromises whatsoever and with a 10-100x analytical query performance increase, I'd be crazy not to use it.

From my perspective, Postgres just keeps on giving.


100% I'm a huge PG fan as well. This should absolutely be "PG + performance and QoL improvements". Where it's not, we want to hear about it.


also how does it compare to PGHero for index advising?


We encourage folks to draw their own conclusions, but we have done comparisons and ours is more efficient and produces faster results. I only say test yourself because I can't give specific details of how/why, so don't take my word for it. :)


This looks great! For the 2x general improvement, how much of that is due to setting/modifying postgres settings and adding good indexes, versus improvements to the code itself?


Code improvements. Indices aren't taken into account at all since that's so workload dependent. Having said that, the index advisor can make finding the right indices to improve performance there more easily as well. Vacuum management handles figuring out the right time to do garbage collection for you, etc. None of that is part of that 2x improvement.


That's impressive, surprised there's that much juice that could be squeezed without changing architecture. I ask because AWS Aurora is flattered in comparison to RDS due to e.g. some dumb oob pg settings that a lot of people don't change.

From seeing the cloud alloydb I had imagined most of the improvements were due to the wal-shipping and cloud native aspects.


Since we run Vanilla PG on on-prem, this is interesting.

Will try once we get clarity on licensing. (Probably we can't use if the code is not open, so let's see)


As a heads up, we're unlikely to make the codebase for it open. I might be able to convince the product team to open source some components of it (our GM has talked about this before in a couple articles), the whole thing won't be open sourced.


What's the plan for the license for this? Especially path to production?


I'm assuming you mean "I start using this under a free dev license, and want to shift up to the paid production version"? And what that looks like? If that's the case, I don't know yet. That's literally being discussed/hammered as we speak, and we likely won't have a good answer until we're ready to go public preview (not sure on timeline right now, depends heavily on how the tech preview goes). This stage of the game is literally a "We built an awesome thing, please poke at it and tell us where it does/doesn't work!".


Is there any HA built into this? Performance claims look great.


Not for Omni no. End-users are responsible for any HA/DR/Replication needs. Cloud AlloyDB has all of it built in, but since it was so tightly tied to the Cloud infrastructure it was simplified for Omni.


What does it mean in terms of license?


We don't have specific details yet about licensing or pricing. Stay tuned as we get closer to GA. Having said that, we're pushing hard for a free usage tier and the product team isn't pushing back. :)


wow




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: