I think they've buried the lede with their image editing capabilities, which seem to be very good! OpenAI's model will change the whole image while editing messing up details in unrelated areas. This seems to perfectly preserve parts of the image unrelated to your query and selectively apply the edits, which is very impressive! The only downside is the output resolution (the resulting image is 1184px wide even though the input image was much larger).
For a quick test I've uploaded a photo of my home office and asked the following prompt: "Retouch this photo to fix the gray panels at the bottom that are slightly ripped, make them look brand new"
I think it did a fantastic job. The output image quality is ever so slightly worse than the original but that's something they'll improve with time I'm sure.
This is gonna kill Craigslist :-). You see these pictures of a really nice car and get there and find it has a crushed left fender, rust holes in the hood and a broken headlight.
We had a realtor list a property in our area and they had used generative AI to "re-imagine" the property because the original owner who had bought it in 1950 and died in it in 2023 had done zero maintenance and upgrades. People who showed up to see it were really super pissed. The realtor argued this was just the next step after staging, but it certainly didn't work here. They took it off the market and a bunch of people showed up to fix it up (presumably from the family but one never knows).
Would lying really lead to more used car sales and thus cause pressure for other to attempt this kind of fraud? And wouldn't people get in trouble for fraud (or at least false advertising?)
When I last bought a used car I found it in a classified newspaper ad: there was no picture.
I looked at every car I considered in-person.
When I found one I liked I paid for an independent pre-purchase inspection, discovered a crack in the radiator, and negotiated the price down to cover my post-sale expense fixing it.
There is a lot of fraud on Craigslist. The fraudsters are very creative. There is this fallacy, the 'Sunk Cost' fallacy, where people accept a substandard result because they feel they have already invested in the result and don't want to 'throw away' that investment. So in a place where it can take 90 minutes to go across the Bay, if you drive clear across the Bay for what you believe to be a pristine item, you may buy anyway (at a reduced price) because you've already invested the time to get there. Whereas, had the seller posted actual pictures of the item in question, you would have said, "I'll wait for one in better condition to come along."
The "success" of Craiglist is that it exposes you item to a wider pool of buyers, which increases the chance that the one person who really wants it, will see it. And if they really want it they are motivated to go out of their way to get to it. But if even the pictures lie and you don't know what you're getting until you get there, your willingness to take the risk and drive out is reduced, which means people will have items that might have sold if you were trusted.
This happens on EBay too. Sellers list something and it isn't as described, and fraudulent sellers will say "but it is! This buyer is trying to scam me." and EBay usually sides with the seller.
My prediction (and hey, its just a guess) is that if people start using these tools to "enhance" the images they use to sell stuff and it becomes a regular practice, then the total population of people who will use Craigslist will go down and prices overall will be reduced as that fraud gets priced in. Sellers won't get as much as they think they should and stop selling there. If it drops below critical mass then the service suffers.
> This happens on EBay too. Sellers list something and it isn't as described, and fraudulent sellers will say "but it is! This buyer is trying to scam me." and EBay usually sides with the seller.
This is not my experience at all and I've used eBay since 2008. eBay is pro buyer to the point that I don't sell anything on eBay (and buy all everything on eBay if price is the same).
Sell on eBay and can confirm this. eBay will side with the buyer 95% of the time, even if we can prove it was their fault. Maybe they side with scam sellers more.
> There is this fallacy, the 'Sunk Cost' fallacy, where people accept a substandard result because they feel they have already invested in the result and don't want to 'throw away' that investment.
I was going to say the same thing. The car on the picture may not have a broken headlight, and the one in reality may, but if it takes for the person >2 hours just to visit that car, they may still end up buying it anyway as they have already invested too much time (and possibly money) into it.
People use transformative filters on their faces on dating apps all the time. If you show up and find someone with a completely different face, is there any chance of romance? I have no idea... the best I can guess is
- No, but people do it anyway due to anxiety
- People can be pressured, the trick is to meet them the first time
- People say they care about faces, but don't actually care about faces
I am not attractive. Thankfully once I am being given the chance to have a conversation with people, after that, they find me attractive regardless of my appearance, in fact, I am more attractive now in their eyes due to the way "I am". Oftentimes all it takes is a deeper conversation.
It happened to me, too. I did not find someone particularly attractive, but their experiences, their views of relationships, the world, and so forth somehow ended up making them look more attractive.
Kontext is probably better at this specific task, if that's what Mistral is using. Certainly faster and cheaper. But:
OpenAI just yesterday added the ability to do higher fidelity image edits with their model [1], though I'm not sure if the functionality is only in the API or if their chat UI will make use of this feature too. Same prompt and input image: [2]
I couldn't help but notice that you can still see the shadows of the rips in the fixed version. I wonder how hard it would be to get those fixed as well.
That might be autoencoder loss rather than the image generation itself. It's hard to tell without doing a round-trip using just the autoencoder without any generation, but it kind-of has the look of that sort of loss.
That's because they're leveraging BFL models (almost assuredly Kontext) - it's mentioned in the release notes.
The input image is scaled down to the closest aspect ratio of approximately 1 megapixel.
I ran some experiments with Kontext and added a slider so you can see the before / after of the isolated changes it makes without affecting the entire image.
Incidentally and veering off topic, I find it extremely annoying that to open both pictures I need to click numerous times to avoid receiving unwanted cookies (even if some are „legitimate“, implying others are not). A further nuisance is from the fact that multiple websites have the same cookies vendor pop-up, suggesting there is a „cookies-as-a-service“ vendor of some sort.
Can anyone point to a good explanation of how these multi-modal text and image models are set up architecturally.. is there like a shared embedding space? or is it lots of integrations..
I don’t know how much tool use there is these days of the llm „just“ calling image generation models, with a bunch of prompt reformulation for the text-to-image model which is most likely a „steerable“ diffusion model (really nice talks by Stefano Ermon on youtube!).
Actually multimodal models usually have a vision encoder submodel that translates image patches into tokens and then the pretrained llm and vision model are jointly finetuned. I think reading the reports about gemma or kimi VL will give a good idea here.
For a quick test I've uploaded a photo of my home office and asked the following prompt: "Retouch this photo to fix the gray panels at the bottom that are slightly ripped, make them look brand new"
Input image (rescaled): https://i.imgur.com/t0WCKAu.jpeg
Output image: https://i.imgur.com/xb99lmC.png
I think it did a fantastic job. The output image quality is ever so slightly worse than the original but that's something they'll improve with time I'm sure.