Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

if you are just using it for inference, i think an appropriate comparison would just be like a together.ai endpoint or something - which allows you to scale up pretty immediately and likely is more economical as well.


Perhaps, but self-hosting is non-negotiable for me. It's much more flexible, gives me control of my data and privacy, and allows me to experiment and learn about how these systems work. Plus, like others mentioned, I can always use the GPUs for other purposes.


to each their own. if you are having really high-sensitive conversations with your GAI that someone would bother snooping in your docker container, figuring out how you are doing inference, and then capturing it real-time - you have a different risk tolerance than me.

i do think that cloud GPUs can cover most of this experimentation/learning need.


together.ai is really good but there is a price mismatch for small models (a 1BN model is not x10 cheaper than 10BN models)

This is obviously because their are forced to use high memory cards.

Are there ideal cards for low memory (1-2BN) models? So higher flops/$ on crippled memory




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: