Hacker News new | past | comments | ask | show | jobs | submit login

> it was difficult to find images where the entire llama fit within the frame

I had the same trouble. In my experiment I wanted to generate a Porco Rosso style seaplane. illustration. Sadly none of the generated pictured had the whole of the airplane in them. The wingtips or the tail always got left off.

I found this method to be a reliable workaround: I have downloaded the image I liked the most. Used an image editing software to extend the image in the direction I wanted it to be extended and filled the new area with a solid colour. Cropped a 1024x1024 size rectangle such that it had about 40% generated image, and 60% solid colour. Uploaded the new image and asked DALL-E to infill the solid area while leaving the previously generated area unchanged. Selected from the generated extensions the one I liked the best, downloaded it and merged it with the rest of the picture. Repeated the process as required.

You need a generous amount of overlap so the network can figure out which parts is already there and how best to fit the rest. It's a good idea to look at the image segment you need to be infilled. If you as a human can't figure out what it is you are seeing, then the machine won't be able to figure it out either. It will generate something, but it will look out of context once merged.

The other trick I found: I wanted to make my picture a canvas print, and thus I needed a higher resolution image. Higher even then what I can reasonably hope with the above extension trick. What I did is that I have upscaled the image (used bigjpg.com, but there might be better solutions out there.) After that I had a big image, but of course there weren't many small scale details now on it. So I have sliced it up to 1024x1024 rectangles, uploaded the rectangles to DALL-E and asked it to keep the borders intact but redraw the interior of them. This second trick worked particularly well on an area of the picture which shown a city under the airplane. It has added nice small details like windows and doors and roofs with texture without disturbing the overall composition.

What I did:




I had similar problems trying to get the whole of a police car overgrown with weeds.

https://imgur.com/a/U5Hl2gO

I was testing to see how close I could get to replicating a t-shirt graphic concept I saw.

I had been using ~"A telephoto shot of A neglected police car from the 1980s Viewed from a 3/4 angle sits in the distance. The entire vehicle is visible but it is overgrown with grass and flowery vines"

This process sounds great, though it seems like DALLE needs to offer tools to do this automagically.


These are trained with pairs of image and caption text, so they work better with text inputs that resemble description for paintings than with simple descriptions or with William-Gibsonian hyperspecified description-text, though it's tempting to do the latter two.

https://imgur.com/a/YB5StlE



That’s right!


What prompts did you use for the infill and detail generation?


Good question! All of them had the same postfix ", studio ghibli, Hayao Miyazaki, in the style of Porco Rosso, steampunk". I used this for all the generations in the hopes of anchoring the style.

With the prefix of the prompt I described the image. I started the extension operations with "red seaplane over fantasy mediterranean city" but then I quickly realised that this was making the network generate floating cities in the sky for me. :D So then I varied the prompt. "red seaplane on blue sky" in the upper regions and "fantasy mediterranean city" in the lower ones.

I went even more specific and used "mediterranean sea port, stone bridge with arches" prefix for a particular detail where I wanted to retain the bridge (which I liked) but improve on the arches. (which looked quite dingy)

(I have just counted and it seems I have used 27 generations for this one project.)


> I quickly realised that this was making the network generate floating cities in the sky for me

Maybe Dalle-2 is just secretly a studio Ghibli/Miyazaki movie fan.


MidJourney allows you to specify other aspect ratios. DALL-E's square constraint makes a lot of things more difficult than they need to be IMO.


Also with Stable Diffusion. It's a really cool feature to have and playing around.


Wow, I've had the same trouble and these are some great tips! Thanks for sharing


Anytime! I have uploaded the image in question: the initial prompt with first generated images, the extended raw image, and then the one with the added details on the city.

https://imgur.com/a/QEU7EJ2


This is a fantastic end result. Thanks for sharing your process to get there.


I think "fitting the entire X within the image" is not done on purpose. The results are more aesthetically pleasing when the subject is large, even if a part of it is missing.


Very nice result. But the plane doesn't look very seaplane-y to me. Did you also try it with a plain plane?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: