Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>I guess the image input is just too expensive to run or it's actually not as great as they hyped it.

We already know they have a SOTA model that can turn images into latent space vectors without being some insane resource hog - in fact, they give it away to competitors like Stability. [0]

My guess is a limited set of people are using the GPT-4 with CLIP hybrid, but those use-cases are mostly trying to decipher pictures of text (which it would be very bad at), so they're working on that (or other use-case problems).

[0]https://github.com/openai/CLIP



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: