Hacker News new | past | comments | ask | show | jobs | submit | mrgaro's comments login

OpenAi enterprise plan especially says that they do not train their models with your data. It's in the contract agreement and it's also visible on the bottom of every chatgpt prompt window.

It seems like a damned if you do, damned if you don't. How is ChatGPT going to provide relevant answers to company specific prompts if they don't train on your data?

My personal take is that most companies don't have enough data, and not in sufficiently high quality, to be able to use LLMs for company specific tasks.


The model from OpenAI doesn’t need to be directly trained on the company’s data. Instead, they provide a fine-tuning API in a “trusted” environment. Which usually means Microsoft’s “Azure OpenAI” product.

But really, in practice, most applications are using the “RAG” (retrieval augmented generation) approach, and actually doing fine tuning is less common.


> The model from OpenAI doesn’t need to be directly trained on the company’s data

Wouldn't that depend on what you expect it to do? If you just want say copilot, summarize texts or help writing emails then you're probably good. If you want to use ChatGPT to help solve customer issues or debug problems specific to your company, wouldn't you need to feed it your own data? I'm thinking: Help me find the correct subscription to a customer with these parameters, then you'd need to have ChatGPT know your pricing structure.

One idea I've had, from an experience with an ISP, would be to have the LLM tell customer service: Hey, this is an issue similar to what five of your colleagues just dealt with, in the same area, within 30 minutes. You should consider escalating this to a technician. That would require more or less live feedback to the model, or am I misunderstanding how the current AIs would handle that information?


> Instead, they provide a fine-tuning API

Most enterprise use cases also have strong authz requirements.

You can't really maintain authz while fine tuning (unless you do a separate fine-tune for each permission set.) So RAG is the way to go, there.


> How is ChatGPT going to provide relevant answers to company specific prompts if they don't train on your data?

Isn't this explicitly what RAG is for?


That is a MASSIVE game changer !

EU transit costs and peering agreements are much more relaxed and cheaper than in US

Europe is also a lot smaller network wise. Hetzner only have to get their traffic to Frankfurt to get connected to practically the whole of Europe. For the US, Ashburn N.Virginia is good but it's still only a single coast.

They are definitely paying under 2c/TB for traffic though.

Routers, optics & interconnects aren't free. $0.01/GB is very reasonable.

Wait. That's cheaper than my CDN. Maybe I should do some shopping

For text based logs I'm almost entirely sure that just using compression is more than enough. ZFS supports compression natively on block level and it's almost always turned on. Trying to use dedup alongside of compression for syslog most likely will not yield any benefits.


It does work, because the companies will realize that gmail no longer delivers their emails and that they need to change their behavior. Also for example AWS SES (Simple Email Service) will give you clear warnings if it detects that recipients mark their email as spam (it seems that for example gmail delivers this information somehow to SES).



I'd say the opposite instead: we need Kubernetes distributions, just like Linux needs distributions. Nobody wants to build their kernel from scratch and to hand pick various user space programs.

Same for Kubernetes: Distributions which pack everything you need in an opinionated way, so that it's easy to use. Now it's kinda build-your-own-kubernetes at every platform: kubeadm, EKS etc all require you to install various add-on components before you have a fully suitable cluster.


I think the Operator pattern will grow to become exactly what you describe. We're still at the early stage of that, but I can see that a group of operators could become a "distribution", in your example.


There have been distributions of Kubernetes for almost as long as there has been Kubernetes.


Gree HVAC units have built-in wifi which supports fully local remote control and there are OSS packages (including home-assistant).


ECS+Fargate does give you zero maintenance, both in theory and in practise. As someone, who runs k8s at home and manages two clusters at work, I still do recommend our teams to use ECS+Fargate+ALB if they satisfy their requirements for stateless apps and they all love it because it is literaly zero maintenance, unlike you just described what k8s requires.

Sure there are a lot of great feature with k8s which ECS cannot do, but when ECS does satisfy the requirements, it will require less maintenance, no matter what kind of k8s you compare it against to.


Kubernetes needs regular updates, just as everything else (unless you carefully freeze your environment and somehow manage the vulnerability risks) and that requires manual work.

ECS+Fargate however does not. If you are a small team managing the entire stack, you need to factor this into accounts. For example EKS forces you to upgrade the cluster to keep in the main kubernetes release cycle, albeit you can delay it somewhat.

I personally run k8s at home and another two at work and I recommend our teams to use ECS+Fargate+ALB if it is enough for them.


> Kubernetes needs regular updates, just as everything else (unless you carefully freeze your environment and somehow manage the vulnerability risks) and that requires manual work

Just use a managed K8s solution that deals with this? AKS, EKS and GKE all do this for you.


There's still Helm oddities, "annotations", CRDs, mutating web hooks, operators, etc. to comprehend before you have any idea of what the system is doing. All it takes is a random annotation to throw all your assumptions away.

It's a complicated mess compared to something like a Nomad jobspec. That's one of the reasons we decided on Nomad while I was at Cloudflare.


It doesn't do everything for you. You still need to update applications that use deprecated APIs.

This sort of "just" thinking is a great way for teams to drown in ops toil.


I agree with @metaltyphoon on this. Even for small teams, a managed version of Kubernetes takes away most of the pain. I've used both ECS+Fargate and Kubernetes, but these days, I prefer Kubernetes mainly because the ecosystem is way bigger, both vendor and open source. Most of the problems we run into are always one search or open source project away.


My experience with k8s has been very much “just”, and I’ve never really had issues or experienced any real friction with updates. shrugs


That's great. I guess I've somehow been making things harder than they need to be.


Looks like you were using k8s APIs directly in your applications, which is a more complex set-up.

In my experience, most k8s deployments are just "dumb" docker images, they're not very "k8s native".

Your use case may be more complex, which is why you have had more difficulty keeping things up-to-date.


Are you assuming the workloads have to use K8s APIs? Where is this coming from? If that’s not the case can you actually explain with a concrete example?


Any cluster extension. Helm is a good example.

https://helm.sh/docs/topics/version_skew/

Istio: https://istio.io/latest/docs/releases/supported-releases/#su...

Literally every kubernetes manifest that hits the server uses a k8s api:

    apiVersion: apps/v1


Man, you don't need to use service mesh just because you use k8s. Istio is a very advanced component that 99% of users don't need.

So if you are going to compare with a managed solution, compare with something equivalent. Take a bare managed cluster and add a single Deployment to it, it will be no more complex than ECS, while giving you much better developer ergonomics.


99% of users don't need kubernetes. Just deploy to heroku, and you'll have a much better developer experience.


My wallet says otherwise


You mean operators?

(genuine tone, not rhetorical)


Sure, an operator is likely to use a wide array of APIs.

But, to reiterate, everything uses APIs. The *betavX APIs are of course likely to be deprecated and replaced with stable APIs after a few versions.


A major factor was inventing a way to keep grass fresh through the year, actually a Nobel worth invention: the AIV solution, named after its inventor.


Here's a link for anyone who was interested in more detail on this, as I was: https://www.kolster.fi/en/blog/aiv-fodder-the-cream-of-the-c...


Thanks Master Ken for doing these awesome posts and participating in the Marc's youtube channel! Both amazing content!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: