The State of Serverless GPUs

todd3834 · on April 29, 2023

This is a great overview! I’ve been working on a project where I’ve tried a few of these and it’s definitely a space where this data is super valuable.

I really wanted to like Banana.dev but runpod consistently outperforms them for my use case. I love the innovation in this space.

Here is my wishlist:

1. Faster cold starts. If you’re building a consumer product that has a request and response using one of these services I’m seeing at least a 20 second delay before the server begins working.

2. Much cheaper GPUs. This is an unrealistic expectation right now because supply and demand have these services completely crowded with people happy to pay. I just wish I could afford to have a few of the faster GPUs prewarmed ready to go but that would be several thousand dollars a month. Doable eventually but for my bootstrapped side project that hasn’t found product market fit it’s a little rough.

3. Ability to create custom models from the service. I’m on an Intel Mac and so making a custom model requires me to ssh into a machine. If only there was a service that let me rent a high end GPU service by the second. /sarcasm. I wasn’t able to get access to docker or install it on runpod and support confirmed it isn’t something they support.

For custom model building I found the prices and flexibility to do whatever you need on lambdalabs.com to be best. Also their prices are blowing all of these other services away. However no serverless option. The space is so crowded with consumers I’m almost afraid to even mention them because I worry I won’t have GPUs available for me. I’m seeing this mentality a lot.

edunteman · on April 30, 2023

Hey! Banana cofounder here.

Firstly, thank you so much for trying our service, we'll do our best to meet performance expectations and win you back!

Re: #1 and #2, cold boots are the most vital thing for us to solve, because it fixes #1 directly and helps #2 indirectly. As we drive cold starts exponentially toward 0s (obv impossible but there's a near-zero asymptote to what's possible, limited only by disk read throughput), it makes it more viable to actually run serverless and scale from 0->1 for each user call. Our goal RN is to hit 1s, as that's generally the sweetspot for LLM app builders to stop feeling pain from the cold boot. We're getting close (2-5s) for most models with Turboboot (shill warning https://www.banana.dev/blog/turboboot). Glorious future would be more like 100ms, but depending on where the size of models end up, 1s+ cold boots may just be the cost of doing business.

Re: #3, if I understand, you're looking for interactive compute (IE you ssh in, mess around, do training runs, attach a jupyter notebook). For my personal ML training I use and suggest Lambda Labs or Brev.dev. Have heard great things of Coreweave, and users of Mosaic have seemed quite satisfied but I don't believe it's as interactive as you may want. Banana has no plans to support interactive GPU sessions, to conserve focus toward being best at cold boots.

You're definitely not alone in this wishlist, so I validate you. If I were building applications on top of a provider, I'd expect the same things. Big gnarly challenge with all the tools being a few years old at best! Fastest route to dependable tools is intense focus, us on cold boots.

AMA, if anyone is interested in digging

r3trohack3r · on April 30, 2023

> cold boots are the most vital thing for us to solve ... Our goal RN is to hit 1s

Did a bit of package management and experimenting with optimizing bits over the wire for serverless scaling on NFLX's internal serverless platform.

I'm selfishly interested in learning more about what you're doing to optimize meeting an incoming request with a live instance, but also might be able to help.

Know you're busy but if you, or your engineering team, have time to connect I'd love to chat: hn@blankenship.io

edunteman · on April 30, 2023

Neat! Bit too busy now, but we'll hopefully put some technical blogs over time to explain these things.

We don't do any predictive scaling yet; only when a call hits the queue do we scale replicas. Replicas cold boot (pod scheduling + application loading models into GPU memory) then subscribe to the queue.

Moving away from this replica + queue design very soon; using K8s primitives has gotten us this far but impossible to hit 1s cold boots with orchestration overhead. Building our own orchestrator and python runtime now.

r3trohack3r · on April 30, 2023

Some questions, no obligation to answer:

Are you colocating your storage plane and GPUs? What’s ingress/egress to a node and are those links near saturated (with comfy room for returning model output, but I’m assuming moving the models dwarfs I/O from customer workloads)? Do you see high reusability across workloads? Have you explored chunking/hashing your workloads IPFS style (do these models radically change, or is there a high chance that two models that share an ancestor also share 50% of their bits. If you’re chunking your models and colocating the storage plane with GPUs, can you distribute chunks to increase the hit-rate of a chunk being on-node? Is your scheduler aware of the existing distribution of chunks across nodes? Given the workload patterns you see, and the shared bits between models, is it even practical to try and chase a local cache hit rate to reduce bits-over-wire? If you have a cache miss, what’s the path to getting those bits to the node with the GPU? How does the cost of that path compare to the cost of the scheduler making a decision?

edunteman · on April 30, 2023

Can only publicly answer one of these: - reusability of workloads: yes, introducing the community templates feature (https://banana.dev/templates) for common models has dramatically cut back on storage requirements and transfers. We're still majority custom code, but it's helped prevent us from exploding storage over people running the same "model of the week"

As for caching / chunking, sounds like you're thinking on our wavelength, perhaps even ahead of us, so maybe I should take you up on the offer to chat! Will reach out.

crazysim · on April 29, 2023

Some of these are a bit more host a server for you or others to run. I wish this comparison also compared billing models a bit and any other value-adds.

The cool thing about replicate.com is that you can use someone else's public model and it's billed to the callee. For someone like me who is gluing models together for a hobbiest setting, it's been great.

For some image identification tasks, it's been pretty neat for me to be able to call https://replicate.com/andreasjansson/blip-2 which is someone else's already deployed model which was already pre-warmed by somebody or level of activity, and get results back. For me, I've been captioning images and putting them into OpenAI prompts.

I've also myself put out https://replicate.com/nelsonjchen/minigpt-4_vicuna-13b to see if maybe it's an improvement in captioning. Unfortunately, it takes like 15 minutes to spin up. That said, it's currently free for me to put up. If someone else wants to run it, they can wait/pay. And they only pay for the runtime and not setup. And if it were to get popular, it'll be naturally warm for everyone. For me, it was 6x the cost of https://replicate.com/andreasjansson/blip-2, and although my experiment did not produce something suitable or usable for me, maybe those caveats are appropriate for someone else's use case, and they can super-easily reuse my deployment on their dime without costing me any money.

Not to mention that replicate also put out some pretty alright APIs or libraries to call their service. It's been consistent.

All this ease of use does come with some caveats, replicate.com is pretty expensive for the raw calls.

touisteur · on April 29, 2023

Wondering whether most major inference libraries support storage-direct or if the listed providers are cheaping out on storage latency. Several seconds to load a 100MB model when pcie 4 is 256Gb/s I'd have expected an order of magnitude less - at least from my experience with gpudirect and real-time processing of up to 200Gb/s network streams with datacenter gpus (not performing inference but I'm talking about actual streaming to gpu memory here). PCIe-5 and H100 (the only one in the line-up that has PCIe-5...) should also improve on that.

touisteur · on April 29, 2023

Additional note, the L4 inferencing board is also still on PCIe-4 (like the L40, very sadly), so NVIDIA doesn't seem to foresee the need for more bandwidth on inferencing workloads in the next 3 years (until they maybe get a H30 out, with the binned not-fully-functional H100s? Maybe? Hopefully?) and/or will make you pay for a full H100.

ec109685 · on April 29, 2023

I don’t understand how these small companies can compete with the big clouds ofer time. They aren’t offering anything fundamentally different than each other or the big clouds themselves.

At some point, Lambda will offer GPU’s on their instances and there are plenty of serverless offering right now that have GPU’s.

Fly.io at least is differentiated with their “run your compute closer to the user” with zero hassle. Though even that doesn’t seem super sustainable in the long term.

dragonwriter · on April 29, 2023

> At some point, Lambda will offer GPU’s on their instances

Lambda does, that’s the whole point of the company.

Do you mean AWS will on AWS Lambda?

ec109685 · on April 29, 2023

Yeah, sorry for confusion. AWS Lambda.

arpowers · on April 29, 2023

The large clouds are intentionally hard to use platforms with a different target customer than the smaller players …

jonfromsf · on April 29, 2023

Wow I always thought AWS was crazy complicated just like .. because stuff is hard. But you're right, if it's complex enough people define their career identity as an "AWS expert" and at that point you have lock-in for life.

jiggawatts · on April 30, 2023

I've found that a lot of the complexity of the cloud is self imposed, the other half is because they're running a huge range of mixed-quality software that needs a lot of scaffolding to even run properly.

For example, if the clouds used IPv6, then 90% of the networking complexity would simply evaporate. But they can't, because despite two decades of warnings, nobody has working IPv6 LAN networks. Similarly, IPv4-only server software is still being written, today.

In comparison IPv4 needs a ton of infrastructure to function. Stateful subnet and IP assignments. Methods for dealing with overlapping private networks. Split DNS, private endpoints, NAT, and on and on...

I could list other examples, but you get the idea. The big clouds are pandering to their customers, and the customers can't modernise their end fast enough, so you end up with complex features to account for all the legacy software and networks.

klabb3 · on April 30, 2023

> I've found that a lot of the complexity of the cloud is self imposed

Indeed. I prefer the terms circumstantial- and inherent complexity. Inherent is the easiest to define: it’s the bare minimum to do what needs to be done, analog to Kolmogorov complexity and in the spirit of “if I had time, I would have written you a shorter letter” or “make it as simple as possible, but not simpler”. In the real world, your software interacts with imperfect and legacy systems with their own issues, but doing that is also part of the inherent complexity (because if you removed it, it wouldn’t satisfy the requirements).

Circumstantial complexity otoh comes in many forms: initial poor design, tech debt from scope creep and requirement changes. In adversarial environments it can even be deliberate, such as DRM, inter-team politicking, and “job security through obscurity”, or “if nobody else understands this doc, it’s less likely to be changed and my coworkers will think I’m smarter”.

In the case of AWS, I suspect there’s all kinds of circumstantial complexity, but in particular there’s “green field accidents”. As an early mover (all cloud tech is extremely new), you don't get the simplest possible design on the first try. Instead, you get something that works, but is full of redundancy. For one, the best way to layer your systems aren’t clear, so you end up with individual products having to reinvent things like consensus, caching, durability, replication, data integrity, yadda yadda. But at the same time, it’s an immense pressure to get stuff out the door, so we make do with what we have and keep cargo-culting until it’s cost efficient to replace a lot of the garbage with something better. That can take 10 or 20 years.

jiggawatts · on April 30, 2023

I've used both AWS and Azure extensively. Everyone seems to think Azure is "weird and difficult", most likely because they had internalised the circumstantial complexities of AWS and can't wrap their heads around a cleaner but unfamiliar model.

In AWS, everything has non-human-readable identifiers shown in a flat lists in some random order. This adds a lot of unnecessary complexity. Almost the entire circus around having to create dozens of AWS accounts just evaporates in Azure's model that has folders called Resource Groups with resources in them with names.

Yeah, that's right: folders and object names. The magic anti-complexity technology that harks back to the 1960s UNIX era that AWS still hasn't been able to replicate in the 2020s despite a decade of trying.

A lot of other incidental complexity stemmed from old issues that have been resolved, but still linger around due to backwards compatibility. For example, not being able to change the IP address of an EC2 VM resulted in all sorts of craziness. Similarly, both Azure and AWS have unexpected naming restrictions on things like KMS / Key Vault secret names. I.e.: Key Vault secrets can't have names that match typical "web.config" parameter names or environment variable names in Linux... "for reasons". Stupid reasons. Hence, you need to have a back-and-forth encoding or escape/unescape mechanism between two things that should be identical.

And on and on...

klabb3 · on April 30, 2023

> Yeah, that's right: folders and object names. The magic anti-complexity technology that harks back to the 1960s UNIX era that AWS still hasn't been able to replicate

Right on. These choices are often frontloaded to the greenfield stage, where you have to make some decision, way before you can say which data model makes the most sense in the future. Even the wisest of architects cannot predict everything, so it’s not for a lack of competence. The people I knew at @faang-gig were incredible bright, but technical design as an early mover is still an incredibly delicate art form.

ec109685 · on April 30, 2023

Eventually the “happy path” will simplify. AWS is spending all its time going up market wooing enterprises, but if there is money to be had at the low end, they’ll go after it it.

Beyond AI, I don’t see any upcoming paradigm shifts, so what’s out there will just continue to get better versus chasing something fundamentally better.

chewbacha · on April 29, 2023

Serverless feels like a low-interest rate era technology.

blandcoffee · on April 29, 2023

Is this a sarcastic quip or are you able to expand on this?

I use a lot of serverless daily, handling events (even ML inference), and it seems to work great, but would love to understand the alternatives and your perspective.

chewbacha · on April 29, 2023

The overhead of abstracting away the servers is a luxury in many ways. This extra cost I believe was heavily funded by low-interest rates which flushed the VC world with dough. There’s been a lot less serverless talk since the fed started cranking the rates

itisit · on April 29, 2023

Sorry, but this feels like a total non sequitir. Serverless or FaaS is pretty mature now. People get the concept, businesses understand the savings, and the services and tooling are stable. We don't talk about it because it's boring.

d0100 · on April 30, 2023

> Serverless or FaaS is pretty mature now

Maybe the backend is, but the frontend aspect is very bad

Both GCP and AWS have terrible web UI's for their cloud functions offerings and every deploy is so slow I'm lucky that next monday is a holiday so I can rest from the stress of having to use GCP last friday (on a deadline)

Codesandbox should offer their own serverless functions so I can actually have serverless for the whole development cycle

itisit · on April 30, 2023

In my opinion the consoles for all the cloud providers are a convenience and should not be relied upon nor routinely utilized as part of operations.

jeffybefffy519 · on April 29, 2023

Have you used serverless for anything in production before?

Acorn1010 · on May 1, 2023

I've used serverless for the past 3 years in production. Unfortunately my experience with it is that it's several orders of magnitude more expensive than a k3s cluster on a cheap provider like Hetzner, and it's slower.

When I last calculated the cost of serverless, it was ~500-5,000x more expensive for the compute compared to k3s and ~10x more expensive for bandwidth at a minimum. To me, removing the burden of maintaining infra didn't justify that level of cost.

Some examples:

- Upstash latency was ~70ms for Redis. Cost was prohibitive.

- AWS Lambda / Cloudflare Worker / Firebase Function cost becomes prohibitive. At least cold starts aren't as bad as they used to be.

- Firebase Realtime Database performance didn't scale, and wound up getting maxed out because of the way it works with nested key updates. Replaced with a Redis instance in k3s which is now running at <2% max capacity and is ~1,000x cheaper.

- Tried Planetscale. Cost was much higher than PostgreSQL.

- Tried Vercel. Bandwidth costs are very scary ($400 / TB egress, or ~350x the cost of Hetzner if you don't count Hetzner's free 20 TB per node)

That being said, I don't know of any good, reasonably-priced GPU offerings.

dataangel · on April 29, 2023

Hard disagree, it's the opposite. Serverless helps increase utilization.

edunteman · on April 30, 2023

Also hard disagree (as one of the Banana founders). Many users on our platform spend less than $10 a month on A100 GPUs, while building whole startups. Compared to the alternative of minimum $1k monthly for an always-on A100.

Animats · on April 30, 2023

By which they mean shared-hosting servers with GPUs.

cateye · on April 29, 2023

As an avid Runpod user, I have come across some information that is not entirely accurate.

- Although the number of models is limited, the platform has a community feature where users can fork models.

I'm not entirely sure of the intended meaning, but contradicts with being able to bring any container.

However, if it pertains to the selection of pre-made template containers, it is true that the options may be limited. Nonetheless, there aren't a significant number of commonly-used open-source models available either...

Or it is referring to the API endpoints. That is indeed correct but it is confusing what this review is exactly about.

- Post-deployment, it can be confusing to understand how the platform works, which may result in users receiving a bill if they are not careful.

Although documentation is sometimes lacking, it is very clear what it costs and there is not such thing as a bill...

- There is no bot or instant support mechanism available.

They have a live chat available, and their response time was good. They are also very active on their Discord channel, providing speedy support to users.

rain1 · on April 29, 2023

why is it called serverless when it is a server

inkyoto · on April 30, 2023

Because the server is entirely abstracted away from computing applications that do not require direct access to the underlying hardware and only have to produce a computation result.

Conceptually, «serverless» is a reincarnation or an evolution of the VAX VMS style cluster computing: applications running on VMS clusters see the cluster as a single computer. Adding a new cluster makes the cluster more powerful, yet the app continues to see the cluster as a single computer. Cluster nodes can be geographically distributed, appear and disappear without the app noticing it. So, serverless apps, whilst technically running on a server, actually run somewhere (on a cluster node) but the app does not know it and does not care about it.

So «serverless» in the cloud takes the concept further, slaps on the automation and makes it more accessible to mere mortals, it scales (almost) indefinitely and completely transparently from the application. It requires no setup and is easy to use (setting up VMS clusters requires an experienced and knowledgeable systems engineer, for instance).

We can agree and disagree on the semantical correctness and richness (or the lack thereof) of the term «serverless», but it has already caught on and has evolved to mean more than just cloud serverless functions.

smw · on April 29, 2023

Is it really useful to be pedantic about it? It's a useful term at this point because people in the community know what it means, no?

Pulcinella · on April 29, 2023

I literally don’t know what it means. Does serverless just mean doing it locally? Like an on premises cluster of GPUs in the IT closet?

If you are doing it on someone else computer, and that computer is a big server farm, I find it odd to use the term “serverless.” And by odd I mean literally the exact opposite of serverless.

throwaway70701 · on April 30, 2023

"Serverless" is a cloud-native software development paradigm. Of course there is a server - the machine, and there is a server running on the machine (that you don't control). The point is in the code you write - it allows you (the developer) to forget about servers as in software as well as servers as in machines.

A serverless program (often called a function) is invoked by a server on-demand instead of running all the time, and it doesn't listen for anything on any port. The client request is passed to it by the server (e.g. on standard input) and it's supposed to pass the response back to the server (e.g. through standard output) - and the server sends it to the client. Then the program halts.

Compare that to Django, Node.js or ASP.NET: A classic backend app exposes a HTTP server on a port and handles client connections itself - and thus it has to run all the time, it literally is a server (in the software sense).

If you know PHP, that's the original serverless. As opposed to the Python/ASP.NET/Node.js backends, your page.php is invoked by the Apache daemon only when someone opens that page, and there's no "server.listen(3000)" in page.php.

Serverless is cool because it allows the cloud provider to fully utilize a machine while the developer pays only for their portion of actual usage. You don't need to reserve a specific amount of compute resources and worry about up/down-scaling or about paying for unused hardware.

Serverless GPUs are about bringing that concept to the GPU as a service space - ideally you'd have a function that uses the GPU. That function could be invoked by sending a request to the platform's server, at which point the server would load and execute it, pass the client request to it and pass the response back to the client once the function is done.

matt_heimer · on April 30, 2023

Having a server is like having a car. Its yours, you do what you want with it but:

1) If you don't drive much then you might be overpaying

2) You have to worry about the car infrastructure, you have to maintain it and if it breaks down you might be offline (unable to travel) until its repaired.

3) If you need bring home a washer and dryer for example you have to worry about your car having the needed capacity (scaling)

In todays world you can decide not to own a car and just use Uber or whatever vehicle sharing service is available to you. You pay only for the rides you need, you don't have to worry about infrastructure (repairs) and you can always pay for larger vehicles on demand.

That's serverless computing. Instead of paying for a physical computer or a virtual computer (compute service) you deploy the app you want to run and pay for each run of your app. Typically the app is some type of request/response or event processing app. The cloud company maintains the compute fleet and when a request comes in the will deploy your app to one of their compute nodes long enough to service the request.

Its called serverless not because there are no servers but because you don't know or care about the server it runs on. The hardware could change out between executions. You don't care about failed hardware, OS patching, etc. The lowest level you typically deal with is selecting the app language (node, java, etc).

uvesten · on April 29, 2023

The ”Executive summary” isn’t, it’s an abstract.

There is zero actionable information there, it’s just describing the article. I was hoping to get a recommendation for which provider to use for which kinds of workloads. I guess I have to read it all then :/

mountainriver · on April 30, 2023

Surprised GKE autopilot isn’t mentioned as it now has some level of gpu virtualization, I’ve found it to be the most flexible fully featured way of approaching this problem

aledalgrande · on April 30, 2023

Does the summary at the end reek of autogeneration or is it just me? There is zero value in it.

CyberDildonics · on April 29, 2023

A serverless GPU means a GPU on a server right?

ris · on April 29, 2023

Inferless provides Serverless GPU solutions

0x445442 · on April 29, 2023

Pardon my ignorance on this matter but if the processing unit isn’t displaying graphics shouldn’t it just be referred to as a cpu?

etaioinshrdlu · on April 29, 2023

It's kind of true, if GPUs were released today they would probably be called something like Parallel Processing Units.

suprfsat · on April 29, 2023

If a CPU isn't "central" to much of anything shouldn't it just be referred to as a PU?

0x445442 · on April 29, 2023

Good question.

mgraczyk · on April 29, 2023

No, "GPU" is a specific piece of hardware. https://en.wikipedia.org/wiki/Graphics_processing_unit

0x445442 · on April 29, 2023

Right, but what graphics are they processing in a serverless context or when it’s mining bitcoin?

pdntspa · on April 29, 2023

The math is similar so GPUs have been hijacked by AI nerds

Gamers everywhere are pissed that GPUs are so expensive now.

mgraczyk · on April 29, 2023

None, "GPU" is a word with a distinct meaning from "unit that processes graphics"

DonHopkins · on April 29, 2023

They're making the line go up when they're mining bitcoin.

https://www.youtube.com/watch?v=YQ_xWvX1n9g

dragonwriter · on April 29, 2023

> Pardon my ignorance on this matter but if the processing unit isn’t displaying graphics shouldn’t it just be referred to as a cpu?

No, i mean, its an auxiliary processor (there is a CPU, and this isn’t it) doing floating point math, so I guess you could call it either a math coprocessor or an FPU, but... we that’s somewhat confusing for historical reasons.

pavlov · on April 29, 2023

The term “serverless GPU” somehow wrecks my brain. Logically the absence of a server suggests its opposite, and the opposite of a server is a client, and client GPUs are the default. But this means “server GPU that’s available on-demand for very short-lived jobs” I guess.

mgraczyk · on April 29, 2023

"Serverless" has been a standard term in industry for at least 7 years. Sometimes words don't map perfectly onto the subcomponents that form them. For example a "mailbox" isn't always a box, and doesn't always contain mail. You just have to learn to use the words and not worry so much about the etymology.

pavlov · on April 29, 2023

The tortured etymology becomes apparent again when these words are combined in new ways. “Serverless GPU” might be something like “mailbox SSD” in your example. What would that mean? It’s not obvious at first sight. The metaphor loses its power when it’s attached to a physical descriptor which is not a metaphor.

mgraczyk · on April 29, 2023

No, it's obvious to everyone working with GPUs who knows what "serverless" means.

It's a very standard construction in English. "Serverless GPU" means "GPU" that is "serverless". If you know what both words mean in the jargon, you know what they mean together. It's ok to not know what they mean, but arguing that it's "tortured" rings to me as misguided obtuseness.

Mike_12345 · on April 29, 2023

> It's a very standard construction in English.

It's a new buzzword coined by the marketing team at Amazon in 2014. Somewhat confusing here as you are renting time on a GPU server described as "serverless".

mgraczyk · on April 29, 2023

When I said "standard construction", I was referring to the combination of "serverless" and "GPU" to make "serverless GPU".

Also you're factually wrong. Amazon didn't invent this term and it was widely used before 2014. Here's a 2012 article using the term.

https://readwrite.com/why-the-future-of-software-and-apps-is...

I truly don't understand the bizarre and conspiracy-theory laden opposition to terms like this. It's a word, everyone who matters knows that it means.

Mike_12345 · on April 29, 2023

> I truly don't understand the bizarre and conspiracy-theory laden opposition to terms like this.

I truly don't care and not opposed to the term. Just trying to explain why someone might get confused.

AdieuToLogic · on May 1, 2023

> Sometimes words don't map perfectly onto the subcomponents that form them.

And sometimes sales people hijack a well-known, universally understood, industry term ("server") to make a product stand out ("serverless").

ralusek · on April 29, 2023

I don't understand how people haven't gotten over this yet. When someone says serverless, I immediately understand that to mean "we've obfuscated the underlying server hardware from the consumer of this product." It means "you don't think about servers," not "there are no servers."

pavlov · on April 29, 2023

I’ve got used to the term, but when it’s in a new context like “serverless GPU”, its inherent absurdity resurfaces.

eurekin · on April 29, 2023

Yup. My first intuition was "probably some server gpu (a100), but sold separately"

gpm · on April 29, 2023

Because if I'm thinking about the gpu I'm fundamentally thinking about the hardware, the server.

Serverless responding to http requests. Sure. I write some code. It gets fed data and returns data. I don't have to know how many cores the server has, or what microcode version the CPU is, or how many other things are running on the server, or if I'm writing an interpreted language (probably the case) even what architecture the CPU is.

But... I need to know all of that if I'm writing GPU code today.

rcme · on April 29, 2023

Are you really thinking about the hardware when thinking about the GPU? For instance, if you use pytorch to write a NN, don't you kind of expect it to execute on the GPU without needing to get into the gritty details of it?

seabass-labrax · on April 29, 2023

The specific version of Nvidia CUDA or AMD ROCm that the GPU supports is often very important; some software needs to be compiled from a specific branch or with specific settings to support a given version. Case in point: the official PyTorch website offers four distinct builds for various platforms, two of those being different versions of CUDA.

I think it'll still be useful for plenty of people to choose which GPU runs their code, even if it's compatible with any GPU offered by the service. You might want to choose an older, cheaper GPU for basic parallel computation, but as supply catches up with demand for a newer and more energy-efficient model you'll then want to switch to that. There are only so many GeForce 4090s to go round :)

faeriechangling · on April 29, 2023

I dunno, maybe it's because the jargon is stupid?

1letterunixname · on April 29, 2023

It comes from the magical thinking of having too many marketing people and not enough engineers who actually build shit.

DonHopkins · on April 29, 2023

Says the guy whose user name claims to have one letter, but actually contains one digit. ;)

ukuina · on April 29, 2023

I, too, think of "classical serverless" rather than "neo-serverless": https://www.sqlite.org/serverless.html

------

Recently, folks have begun to use the word "serverless" to mean something subtly different from its intended meaning in this document. Here are two possible definitions of "serverless":

Classic Serverless: The database engine runs within the same process, thread, and address space as the application. There is no message passing or network activity.

Neo-Serverless: The database engine runs in a separate namespace from the application, probably on a separate machine, but the database is provided as a turn-key service by the hosting provider, requires no management or administration by the application owners, and is so easy to use that the developers can think of the database as being serverless even if it really does use a server under the covers.

dvh · on April 29, 2023

Serverless = pay per request

Cloud computing = someone's else's computer

touisteur · on April 29, 2023

Yeah, should be 'elastic' and 'very reduced runtime, possibly just inferencing', so exposing Triton to an API gateway and putting a custom load balancer and task queue facade ? Curious too.

Also, GPUs do other things than inferencing, right?

DonHopkins · on April 29, 2023

The term should be "Serverless GPUless" to be fair and consistent. What's good for the goose is good for the gander.

1letterunixname · on April 29, 2023

It's a stupid misnomer. "Cloud" is still another.

Perhaps "distributed scheduling and execution" is too straightforward than marketing wank.

TeMPOraL · on April 29, 2023

I mean it's all because marketers didn't want people to think of DCOM or CORBA when dealing with this "new" technology.

(I can kind of understand it in case of CORBA - if you put the word "broker" in the name of your product, no one else will want to associate with it.)

thomastjeffery · on April 29, 2023

Something like "GPU for Serverless computing" would be much better.

GPUs don't contain servers. Serverlesses contain GPUs.