OpenAi enterprise plan especially says that they do not train their models with your data. It's in the contract agreement and it's also visible on the bottom of every chatgpt prompt window.
It seems like a damned if you do, damned if you don't. How is ChatGPT going to provide relevant answers to company specific prompts if they don't train on your data?
My personal take is that most companies don't have enough data, and not in sufficiently high quality, to be able to use LLMs for company specific tasks.
The model from OpenAI doesn’t need to be directly trained on the company’s data. Instead, they provide a fine-tuning API in a “trusted” environment. Which usually means Microsoft’s “Azure OpenAI” product.
But really, in practice, most applications are using the “RAG” (retrieval augmented generation) approach, and actually doing fine tuning is less common.
> The model from OpenAI doesn’t need to be directly trained on the company’s data
Wouldn't that depend on what you expect it to do? If you just want say copilot, summarize texts or help writing emails then you're probably good. If you want to use ChatGPT to help solve customer issues or debug problems specific to your company, wouldn't you need to feed it your own data? I'm thinking: Help me find the correct subscription to a customer with these parameters, then you'd need to have ChatGPT know your pricing structure.
One idea I've had, from an experience with an ISP, would be to have the LLM tell customer service: Hey, this is an issue similar to what five of your colleagues just dealt with, in the same area, within 30 minutes. You should consider escalating this to a technician. That would require more or less live feedback to the model, or am I misunderstanding how the current AIs would handle that information?
Europe is also a lot smaller network wise.
Hetzner only have to get their traffic to Frankfurt to get connected to practically the whole of Europe. For the US, Ashburn N.Virginia is good but it's still only a single coast.
For text based logs I'm almost entirely sure that just using compression is more than enough. ZFS supports compression natively on block level and it's almost always turned on. Trying to use dedup alongside of compression for syslog most likely will not yield any benefits.
It does work, because the companies will realize that gmail no longer delivers their emails and that they need to change their behavior. Also for example AWS SES (Simple Email Service) will give you clear warnings if it detects that recipients mark their email as spam (it seems that for example gmail delivers this information somehow to SES).
I'd say the opposite instead: we need Kubernetes distributions, just like Linux needs distributions. Nobody wants to build their kernel from scratch and to hand pick various user space programs.
Same for Kubernetes: Distributions which pack everything you need in an opinionated way, so that it's easy to use. Now it's kinda build-your-own-kubernetes at every platform: kubeadm, EKS etc all require you to install various add-on components before you have a fully suitable cluster.
I think the Operator pattern will grow to become exactly what you describe. We're still at the early stage of that, but I can see that a group of operators could become a "distribution", in your example.
ECS+Fargate does give you zero maintenance, both in theory and in practise. As someone, who runs k8s at home and manages two clusters at work, I still do recommend our teams to use ECS+Fargate+ALB if they satisfy their requirements for stateless apps and they all love it because it is literaly zero maintenance, unlike you just described what k8s requires.
Sure there are a lot of great feature with k8s which ECS cannot do, but when ECS does satisfy the requirements, it will require less maintenance, no matter what kind of k8s you compare it against to.
Kubernetes needs regular updates, just as everything else (unless you carefully freeze your environment and somehow manage the vulnerability risks) and that requires manual work.
ECS+Fargate however does not. If you are a small team managing the entire stack, you need to factor this into accounts. For example EKS forces you to upgrade the cluster to keep in the main kubernetes release cycle, albeit you can delay it somewhat.
I personally run k8s at home and another two at work and I recommend our teams to use ECS+Fargate+ALB if it is enough for them.
> Kubernetes needs regular updates, just as everything else (unless you carefully freeze your environment and somehow manage the vulnerability risks) and that requires manual work
Just use a managed K8s solution that deals with this? AKS, EKS and GKE all do this for you.
There's still Helm oddities, "annotations", CRDs, mutating web hooks, operators, etc. to comprehend before you have any idea of what the system is doing. All it takes is a random annotation to throw all your assumptions away.
It's a complicated mess compared to something like a Nomad jobspec. That's one of the reasons we decided on Nomad while I was at Cloudflare.
I agree with @metaltyphoon on this. Even for small teams, a managed version of Kubernetes takes away most of the pain. I've used both ECS+Fargate and Kubernetes, but these days, I prefer Kubernetes mainly because the ecosystem is way bigger, both vendor and open source. Most of the problems we run into are always one search or open source project away.
Are you assuming the workloads have to use K8s APIs? Where is this coming from? If that’s not the case can you actually explain with a concrete example?
Man, you don't need to use service mesh just because you use k8s. Istio is a very advanced component that 99% of users don't need.
So if you are going to compare with a managed solution, compare with something equivalent. Take a bare managed cluster and add a single Deployment to it, it will be no more complex than ECS, while giving you much better developer ergonomics.
reply