Hacker News new | past | comments | ask | show | jobs | submit login
Cloud Traffic (tbray.org)
156 points by MindGods 5 months ago | hide | past | favorite | 60 comments

I do wonder what the equivalent review of AWS Lambda would look like from an equivalent writer but one versed in GCP and K8S, encountering Lambda and IAM for the first time.

I think one can easily make concerning arguments about all the things Lambda abstracts away and how concerning that is.

HAVING SAID THAT, the substantial fundamental difference between something like Lambda and something like K8S is the value proposition. Lambda works backwards from your business logic. How to go from 0 to "working production code" with the minimal amount of steps, yet will still scale essentially infinitely (so long as you are willing to tolerate the latency costs)

K8S seems to me like it instead works backwards from your infrastructure. How to go from one set of working infrastructure to another set of working infrastructure, but one optimized for solving problems that your organization doesn't have yet until you're at the point where you're looking to migrate your whole cloud strategy to another provider because Azure gave you a 2% higher discount than AWS did.

Meanwhile, by the time you've built your first working K8S-based architecture, your competitor with Lambda is already in production serving customers.

I know, I know, it's not Apples and oranges. But K8S and Lambda are of the same "generation", both launching in 2014. Lambda was AWS's big bet of "Here is the amazing value you can get if you go all-in on AWS-native development", K8S was GCP's equivalent "Here's how you can design your cloud infra to be portable if you explicitly DON'T want to go all-in one cloud provider."

So while they offer almost diametrically opposite solutions (philosophically), they are pursuing the same developers to convince them of the mind share.

Me, I'll take the one that actually cares about getting me to production with working business logic code.

It's odd to see this because Google App Engine is (was) not that hard and launched in 2008. I'm no longer doing that kind of work, but it seemed pretty good for people who want to keep it simple?

I wonder Tim Bray would be better off kicking the tires on Firebase.

> Google App Engine is (was) not that hard

It's incredible hard to use GAE in the same fashion as the LAMP or whatever web framework stack at its time.

GAE is a primary example of overly confident technologists anticipating the next trend:

They anticipate things happening too far in the future.

Of course, GAE represent the eventual next step for application development. At the time, for anyone inside Google, that's obvious.

And what happened was that neither Google itself, nor the industry embraced that model in the time frame that allows GAE to be relevant.

An insightful comment.

k8s and Lambda aim to solve the problem of service management & deployment. k8s focuses on abstracting the server underneath and automating its operations, aka orchestration, "in-the-weeds & full control". Lambda instead focuses on outsourcing the server and its operations to AWS, aka serverless, "don't even think about it".

Infrastructure engineers value the wide & deep featureset of k8s, and will accordingly invest significant manpower into its maintenance. Business developers value the simplicity of lambda and appreciate its low overhead and low price.

The tension between these two solution & their audiences is resolved by the value-chains emanating from the top of their organizations. In a startup phase, leaders will likely value Lambda over k8s because managing services is not a core competency, i.e., they don't care enough. "Just get the application on prod and make sure it doesn't go down! How hard can that be?"

As the organization develops over time, situations will arise that cannot be solved easily by Lambda or are too expensive to work on Lambda, which will necessitate expensive solutions like k8s. The leaders don't want to care about uptime, for example, so they'll hire people who do, people who have a different sense of value and cost.

I think it's telling that the presenter of Lambda at AWS is not the CTO, but the CPO himself:


Four years later, an engineer gave a presentation about how Netflix built their own in-house FaaS, seemingly outgrowing Lambda:


This comment (and the original article) confuse me: GCP offers cloud functions as a lambda equivalent. K8s and lambda don't fill the same niche. The niche k8s fills doesn't have an AWS alternative, as far as I know.

Now whether that niche is useful is a question that I don't have an answer for.

You're right, the niche that Lambda and K8s fill are quite different. However, what is the higher level problem that is being solved for? The smallest unit of execution for code is a (virtual) server, with a kernel, and filesystem, and just think of all the Nagios disk space emails to ignore. For application developers, the problem is that they don't want to manage an entire server, they just want their code to run when it's asked to, and not have to care about running out of inodes or some nonsense.

From that single problem came a vast number of solutions, some competing, others not. As an application developer, I write my code, tack on manifest file appropriate to the platform (K8s pod, Lambda, Dockerfile, whatever.) and I've got a service. Without all the minutia of dealing with a server. There are tons of differences between Lambda and K8s; don't start. Look instead at how similar the architecture of the interface they provide to the application developer is. (No, not the literal DSL that is the interface, either.)

(For K8s, keep in mind that there's a separate operations team managing the cluster and the hypothetical application developer is not on that team. That's one place where the niche each fills is clearly different.)

Despite the rise of cloud providers, the hardware and space that on-prem exists in and on didn't magically disappear, and K8s moves the state of the art forwards for people that are on their own hardware, possibly with cloud augmentation. For those in that situation but looking for more modern ways to manage it, K8s fills a very useful niche. (It also fills other niches, but it's very useful there.)

Right, this makes sense. To summarize, the kuburnetes niche is for people who can't use a public cloud provider, but whose employees want to have familiar interfaces (and to not have to deal with the ops).

Or briefly, it's a tool for organizations to build their own internal lambda for things that can't be hosted externally. (and then, things like GKE and EKS are the cloud providers providing not just a familiar interface, but an effectively identical interface to the internal clouds, which makes moving workloads easier).

We use it to expose not just a common, but mostly a simpler interface to data/engineering people via EKS/GKE to do things they commonly need to with less of a learning curve, but we wouldn't run Kubernetes on bare metal, and we don't commonly move workloads between clouds (though we do run Kubernetes on both AWS and GCP).

Turns out you need a lot of AWS/GCP knowledge even to run relatively simple apps, and even more for processing pipelines, and especially on AWS, every product has its own often steep learning curve, they don't necessarily share much. We find our k8s platforms offer a nice middle ground between siloing AWS/GCP knowledge and access away from engineers/data people and requiring everyone to fully understand the respective clouds, so they can have wide-ranging access and take care of things themselves.

For example, it can be a huge pain to find out how to debug issues in native AWS/GCP products (Cloud Run, looking at you especially); with Kubernetes, common usage patterns share a lot more, e.g. debugging pretty much always involves pod logs, maybe exec'ing into a pod, reading and editing YAML in a familiar structure, be it a scheduled processing pipeline, app deployment, or whatever else, so we can train people how to do this once, and we find they're able to transfer that knowledge pretty well with a little pointing them to the right namespace/pod, and they can have all the access they need (depending on the cluster).

Our own little tailored micro-clouds, if you will, but with a huge lot less effort than doing this from scratch, since a lot of the hard work involved is done by managed Kubernetes, and without much of the overhead of a cloud that has to work with 7823783 products involved. Similar things can be achieved by providing CDK templates, Terraform modules, well thought-out IAM policies, etc. etc., but this works well for us and appears to be a lot less hassle and complexity than those alternatives.

It also helps security-wise that learning Kubernetes security once goes a very long way, and the concepts seem quite a bit more limited than for, say, running arbitrary things directly on AWS. Locking things up in separate clusters and namespaces can also add an additional layer to limit blast radii.

Though we also use k8s because, especially on GCP, where we do the most data-intense work, strict GDPR compliance seems to be incredibly hard to achieve with much of their serverless offerings, it's incredibly easy to accidentally transfer data to the US without realizing, and there seems to be no mechanism to prevent that globally, though that better fits the category of things that cannot run on (serverless) Cloud. We still run many of these kind of workloads in the Cloud, but in GKE. Those that cannot run in cloud at all due to legal requirements don't run on Kubernetes, but directly on their own dedicated hardware on-prem.

Inter-cloud portability is neat as well, but we wouldn't have picked k8s just for that.

I would imagine AWS Elastic Kubernetes Service is the aws alternative?


That's not really a Kubernetes alternative since it is Kubernetes.

ECS would be the alternative in that case, no?

Exactly and ECS failed hard enough that they had to introduce EKS as consolation.

Cloud Run is a more flexible lambda, but with a slightly worse cold starts, but much better concurrency story. See https://futurice.com/blog/openresty-a-swiss-army-proxy-for-s... where I shoved a scale to zero nginx proxy in a lambda-like container.

The necessary first step for any in-depth technical discussion is to set the context.

Per the topic in this article, the context probably could be set to 2 extreme opposite ends: hardware at the bottom, application at the top. Then we are clear that the focus is on how to reuse the pieces in the middle in order to optimize for a suite of objectives:

* Application's software engineering quality * Application performance * Operational excellence * Hardware utilization

When hardware is pinned to CPU based general computing, and application to a typical server side web/mobile one, the environment is set in a public cloud; then this background more or less matches all sorts of different hot trending tech technologies, for example, container, microservices, service mesh, server less, etc.

In the end, generally speaking, one who produces the code for application developers, depending on how narrow they want to target, moves more distant from hardware.

At the same time, the complexity spectrum between hardware and application is largely constant. That means if application moves into certain directions, hardware eventually coaches up. And shed aways a lot of its legacy. The computation towards AI & ML calls for GPU, precisely because CPU cannot maintain sufficient abstraction and performance due to its extended complexity gap with AI & ML computing.

Back to CPU based application development.

Then what you observed is precisely what's hapoening. I.e. serverless optimizes for application abstraction for developers directly. While container optimizes hardware neutrality, while it's audience is devops, I.e. the operators.

The intention or results are not what you imagined though. Google does not particularly think the tech evolution should be from lower substrate. Rather, it's the only place they can differentiate from AWS. For AWS, the goal is to offer convenience to customers. Be it serverless, or any other tools or tech, it's largely determined by their assessment on how to capture the next high ground of tech competition.

Obviously, it's always safer to start from more abstraction in application development. As it directly interact with the customer and provide the most direct feedbacks.

Why in the world is this downvoted? HN is so frustrating oftentimes.

I suggest HN to have a quick drop down list for down voting reasons, such that some form of thinking is applied while clicking the icon.

What about knative?

I love this quote:

> I worry a little bit about hiding the places where the networking happens, just like I worry about ORM hiding the SQL. Because you can’t ignore either networking or SQL.

The irony (or not irony, who even knows?), though, is that I always worried a little bit about EBS for similar reasons. It's gotten really, really good but it was a terrifying abstraction back in the day.

> it was a terrifying abstraction back in the day.

How come? NAS, SAN, iSCSI, etc. existed and were commonly used way before AWS or EBS. And EBS really is a dynamically routed network share in the end.

You get a lot more visibility and control with iSCSI, ATA over eth, infiniband, etc. The downside is you need to use and understand that visibility and control. EBS is surfaced to you as a simple locally attached block dev. The network performance and challenges are not just managed for you, they’re largely opaque. As the user you can’t observe the (equivalent to) fiber channel topology, SCSI commands, discarded frames, etc. and that’s the power of EBS as well, users don’t have to deal with anything besides a locally attached virtual disk.

So much this.

There's also the part where EBS SLA and durability might surprise you, like the great "internet disappeared" around 10 years ago.

The last point is a good one around absolute vs subjective experience.

If you’re looking at 1M disks 99.999% availability is great. But if your entire company runs on 10 instances your day will be utterly ruined when any one of those root volumes goes away.

To that sad admin it feels like they’re sold 1/100,000 and got 1/10. And their impact is closer to “business critically failing” than “active automated remediation workflow incremented.”

> NAS, SAN, iSCSI, etc. existed and were commonly used way before AWS or EBS

And were notorious sources of performance and reliability issues, especially iSCSI. Anyone with operational experience with those was not unreasonably questioning the risk of being in the same boat with less control, and EBS had some highly-visible failures which validated that concern.

It took a number of smooth years after the big 2011 outage to get people to be more confident in it.

In my 20+ years of stoarage and ops experience, iSCSI was way more reliable, tunable, and monitor-able than anything in the FiberChannel world.

“Just use TCP/IP because everything else does” is a strategy that works, and it’s the reason FC is (mostly) dead.

Though nothing beat the braided differential parallel SCSI cables for cable porn supremacy. Shared-disk clusters looked like they were damn serious alien-military-grade stuff back then.

Maybe you were lucky - I had the misfortune of supporting some iSCSI devices which were slower & less stable, and lots of places had to have the “just because it uses the same cable doesn’t mean you should share a port” talk with their network people – but I was mostly comparing it to direct-attached storage which has fewer parts to fail and, critically, doesn’t have the same correlated failures which you get with shared anything (and, of course, the trade-off or forcing you to do redundancy at a higher level). Most of the problems I remember in the last decade fell into that last category: over-commitment and reserved capacity, failures showing that redundancy wasn’t as good as the vendor promised, etc. which are quite reasonably concerns for EBS, too.

I dealt mostly with LeftHand and EqualLogic gear when iSCSI was getting rolling. Both were far more reliable than the various FC gear I dealt with around the same time. You could pull any node and things kept chugging. Rolling software updates were non-events.

Simplicity has its virtues I suppose. Running 4 or 8 gigabit cables to each node on dedicated switch ports likely also helped.

> for that, synchronous APIs quite likely aren’t what you want, event-driven and message-based asynchronous infrastructure would come into play. Which of course what I spent a lot of time working on recently. I wonder how that fits into the K8s/service-fabric landscape?

Unfortunately, unlike Envoy and networking, the story for Kubernetes event-driven architecture is not yet as stable. Generally it’s roll-your-own for reliability. Eventually Knative Eventing will serve that purpose, but it hasn’t hit 1.0[1] or anything more stable than that... the CRDs recently made it to beta, though and an initial go at error handling was added relatively recently (mid-to-late last year)[2]

1. https://github.com/knative/eventing/releases

2. https://github.com/knative/eventing/tree/master/docs/deliver...

There is Keda [1] which seems to fly under the radar but works well and seems to be a great addition to the Kubernetes landscape

1. https://github.com/kedacore/keda

We "use" traffic director at work. We thought we'd be able to use it for service discovery across our mesh, but it does way more than that in problematic ways that leak internal google abstractions :<

I asked one thing of the product team (roughly), "we like the bits where it tells envoy the nodes that are running the service, and what zones they are in. we don't like all the other weighting things you try and do to act like the google load balancer."

More in-depth we we've had traffic director direct traffic to instances within a specific zone by weighing that zone higher before instances in that zone would even start up, causing service degradation.

We considered writing our own XDS proxy which would run as a sidecar to envoy, connecting to traffic director and stripping all the weights.

After some back and forth with the TD team, we came up with a solution to instead fool it into not weighing things by setting the target CPU utilization of our backends to 1%...

One thing made me gasp then laugh. Kelsey said “for the next step, you just have to put this in your Go imports, you don’t have to use it or anything:

    _ "google.golang.org/xds"
I was all “WTF how can that do anything?” but then a few minutes later he started wiring endpoint URIs into config files that began with xdi: and oh, of course. Still, is there, a bit of a code smell happening or is that just me?

It's not a code smell; it's idiomatic Go; the search he wants is [import for side effects].

I also think there's a subtle joke here, check the top author of https://www.w3.org/TR/xml-names/ :)

To be fair, you can write a lot of idiomatic code without ever using packages that use `init()` [0].

(The most common usage is importing database drivers).

I've written Go for a while, and it still feels a bit exotic, compared to its other control flow patterns.

Maybe "code smell" was the wrong word. But writing "surprising" logic in an `init()` function would definitely not be considered idiomatic.

[0]: https://golang.org/doc/effective_go.html#init

I guess I'd just say, when a Go programmer sees an "import _" in the README.md for a library in Github, they generally know exactly what's happening; "oh, you import this for side effects".

Yea that's fair.

It's admittedly a more exotic than the traditional alternative: importing a module, initializing some struct, passing it to some other module's struct-initialization.

Of course, now that I write it all out, and I imagine that I'd need to write all that just to give my gRPC service a load balancer... ok, I can accept the fancy hand-waves, give me the underscore import! :)

Part of what makes this idiomatic in golang is that the compiler will refuse to compile code with unused imports floating around. It's a bit weird when importing for side effects (those imports are basically marked 'always in use'), but never having random unused imports after refactoring code is a code aroma, and enforced by the compiler. (I am not aware of tooling to handle the corner case of an import-for-side-effect library going unused after refactor.)

Also, correction, the correct import is

    _ "google.golang.org/grpc/xds"

It's quite bad design from golang to have these kind of imports. They claim they dislike "magic", yet here we are.

> There seemed to be an implicit claim that client-side load balancing is a win, but I couldn’t quite parse the argument.

Having 1 hop for load balancing (usually on a server that most likely isn't on the same rack as your application) is worse than 0 hop for load balancing in your localhost. No Single point of failure and much lower chances of domino effect if a configuration goes awry.

> When would you choose this approach to wiring services together, as opposed to consciously building more or less everything as a service with an endpoint, in the AWS style?

I do not understand this. What's "wiring services together" vs "service with an endpoint"? They both are one and the same thing in context of gateway vs service mesh. Maybe you should read this - https://blog.christianposta.com/microservices/api-gateways-a...

> but as with most K8s demos, assumes that you’ve everything up and running and configured

Because that is what the whole "Cloud" thing is about. You don't pull out an RJ45 everytime you need to connect your server to your router, you just assume your cloud provider did it for you. You don't compile your own Kernel & OS to run your services, you just use a Linux Distro and get over it. Kubernetes is supposed to be a cloud offering, you are not supposed to configure and run it.

> One thing made me gasp then laugh. Kelsey said “for the next step, you just have to put this in your Go imports, you don’t have to use it or anything... I was all “WTF how can that do anything?” but then a few minutes later he started wiring endpoint URIs into config files that began with xdi: and oh, of course. Still, is there, a bit of a code smell happening or is that just me?

Because you weren't paying attention to the Speaker. He clearly mentions the drawbacks to sidecar as proxy model - additional latency (although the latency is much lower than that of a single Gateway architecture, but even 1ms could be disastrous for some applications). To cater to this high performance crowd they have envoy as an application library model, which is of course more difficult to adapt, but worth it when the added latency drops to nano seconds.

The galaxy levels of architecture bloat implied here for a simple shopping cart app make me deeply uneasy. How much of this complexity is necessary?

It buys you stuff. Whether it is necessary depends on whether you need that stuff and how much you are willing to pay for it.

By stuff I mean things like support for migrating a multi-thousand VM component between multiple cloud providers (and possibly your own infrastructure) while maintaining service levels across geographical regions and not having to make your other services aware of the migration.

Or maybe routing that takes into account the current state of the rollout of a possibly-breaking upgrade of the backend VMs.

Want to go from "treat your VMs like cattle, not pets" into "treat your (micro-)services like cattle, not pets"? Then maybe this stuff is necessary.

I think the example of a shopping cart is just that: an example. It's similar to how getting a whole dev environment set up to print "hello world" to the terminal is quite pointless when you can just type "hello world" and skip all that set up. If getting those eleven characters on the screen is really the limit of what you want to do, installing clang is quite foolish and unnecessary. But normally examples are just a gateway to understanding how to do something bigger.

> It’s impressive, but as with most K8s demos, assumes that you’ve everything up and running and configured because if you didn’t it’d take a galaxy-brain expert like Kelsey a couple of hours (probably?) to pull that together and someone like me who’s mostly a K8s noob, who knows, but days probably.

> I dunno, I’m in a minority here but damn, is that stuff ever complicated. The number of moving parts you have to have configured just right to get “Hello world” happening is really super intimidating.

> But bear in mind it’s perfectly possible that someone coming into AWS for the first time would find the configuration work there equally scary.

I feel like I missed AWS and K8s... I've been on-metal for the last 6 years, and the 3 before that on virtual machines. My use limited to bucket storage (S3, GCP).

So this sounds like something I could experiment with: Take a simple and existing app and lift and shift a basic equivalent onto AWS, Azure and GCP in whatever is the most idiomatic way. Compare the learning curves and publish the code bases and config on github and see how much I missed (I just know IAM will be an issue due to the number of times I hear people complain about it).

It won't be so straightforward because each cloud vendor has multiple offerings catering to how much control you want over your compute. The easiest is Serverless (Cloud functions, lambda), then there is managed Kubernetes offerings (which will standardize your experience over all vendors and on-prem infra), third is opinionated PaaS (App Engine, Beanstalk, Heroku) and last one is empty VMs (Compute Engine, EC2) which is very similar to your current experience.

I loved this. Every time I try to work with Googles opensourced stuff it always feels so complicated to me.

>Every time I try to work with Googles opensourced stuff it always feels so complicated to me.

It is, but so is the problem space, as evidenced by AWS's own clusterfuck of services.

It kinda comes with the problem space they're trying to solve: how do you efficiently run a bunch of services on a bunch of computers? Virtualization solves one problem but you still have networking and service discovery to sort out.

If you're just running a personal blog, and toy projects, then k8s is really just more harm than help.

It is, but so is the problem space, as evidenced by AWS's own clusterfuck of services.

It kinda comes with the problem space they're trying to solve: how do you efficiently run a bunch of services on a bunch of computers? Virtualization solves one problem but you still have networking and service discovery to sort out.

Oh it absolutely is complicated. The way amazon does it seems to click better with my mental model for how all of this is supposed to work though. I'm really not sure if it's right or wrong or it just depends on how you model these problems.

What's your mental model for AWS services?

LEGO bricks basically. They kinda click together in my mind with a sorta super structure of IAM and VPCs

Why does he think GCP regions aren't the same as AWS regions?

There are differences in redundancy and isolation that could impact your resiliency strategy. An AWS zone can include multiple data centers that are isolated from other DCs in the same zone. A region is then a collection of those zones, and each zone in a region could be as much as 60 miles away from another. [0] A cursory reading of the GCP docs indicates that a GCP region is more equivalent to an AWS zone, but that may be a misunderstanding based on the more abstract language used. [1] Azure also seems to treat a region like AWS treats a zone, but they also have a concept of paired regions for more redundancy [2]. I can't seem to find details on the isolation guarantees for their DCs / Zones / Regions / Pairs though.

[0]: https://aws.amazon.com/about-aws/global-infrastructure/regio...

[1]: https://cloud.google.com/compute/docs/regions-zones/

[2]: https://docs.microsoft.com/en-us/azure/best-practices-availa...

I would agree that they are but since VPCs and storage buckets can span multiple regions on GCP I could understand being confused.

It seems he means in terms of networking as in the prior comment: "I was particularly impressed by how you could set up “global” routing, which means load balancing against resources that run in multiple Google regions"

GCP has a global networking plane, so in a single GCP project you can have resources and networking configured together from every region/AZ around the world. In AWS (which I don't use so don't quote me, and please correct me) regions are very separate from each other. You would need to create separate accounts for resources in different regions and then configure the networking between them.

The big difference is AWS doesn't provide a GSLB (global server load balancing) service outside of Cloudfront. GCP on the other hand provides L7 GSLB.

You still need to create VPCs in each region and manage those but broadly speaking building a multi-region architecture with GCP is a lot better. Hell multi-zone is also easier in GCP because there are more regional primitives available.

Would https://aws.amazon.com/global-accelerator/ be what you’re talking about?

I haven't actually used this but this looks very similar to Google GSLB actually, in which case that is great news.

I think the bigger difference is that layer-2 really doesn’t exist in GCP or Azure but it sort-of does in AWS.

This is why virtual nets can span geographic or AZ boundaries in non-AWS clouds; they always basically do layer-3 routing even for “local subnet” traffic.

AWS also let's you DNS load balance on top of regional load balancers (there is some DNS traffic management with health check integration)

Yes, R53 can help you build your own GSLB by using DNS health checks and georouting but it's not the full picture.

I love when cloud experts (Tim, Kelsey) look at other clouds and go "damn, is that stuff ever complicated". Makes regular devs like myself feel a little better.

> Kelsey said “for the next step, you just have to put this in your Go imports, you don’t have to use it or anything:

    _ "google.golang.org/xds"
> I was all “WTF how can that do anything?

very suspect language design from golang

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact