Hacker News new | past | comments | ask | show | jobs | submit login

Raising the cfg ("classifier-free guidance") scale is essential for following the prompt, but if you raise it too high the image gets weird and saturated.

According to Google's Imagen paper this is literally because the pixels get multiplied by the cfg scale and start clipping; they have a technique called dynamic thresholding that replaces it. Not sure if SD uses this, but I saw Emad hinting they were training an Imagen model…




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: