Hacker News new | past | comments | ask | show | jobs | submit login
Diffusion Without Tears (baincapitalventures.notion.site)
62 points by jxmorris12 29 days ago | hide | past | favorite | 20 comments



I have an undergraduate degree in Math and generally find math quite enjoyable, however, even I had a pretty hard time grasping most of what this blog was talking about. I think it pretty quickly went from extremely high level (add noise to image, then remove noise) to extremely low level specific.

Also I had a hard time figuring out which part of the equation differentiate the equations for different data points, is that what the meaning of "theta" is in all the equations? To guide the initial noise towards one type of image instead of another is that what theta is responsible for? Is the innovation in GenAI images to use text embeddings to create the theta?

I definitley feel some tears coming on.


Hey, I’m the post author — thanks for reading.

Theta represents all the model params — all the weights in the neural network. The convention is to write theta for the “learned” score function and omit theta for the “true” score function.


Yeah, the article could definitely use some clarification in many parts (the author may be suffering from the curse of knowledge a bit). Plus there's the fact that even if you know your ODEs, SDEs are a different beast bringing in probability (certainly one may not be accustomed to seeing `p(x|y)` in the middle of a differential equation…)


As a material scientist, the roller coaster to figure out what 'diffusion' we're talking about here was surprisingly funny.

Means: maybe edit that title a bit :)


Yeah there's lots of "diffusions" out there and I was similarly curious which one they were going to be writing about.

I liked the post though.


I've noticed that when finance guys learn that there's a really useful AI thing called diffusion, they get all excited and start writing stochastic differential equations and drawing 1-D Brownian motion plots all over the place. It's not yet clear to me whether this helps anyone understand AI diffusion models, but they seem to enjoy it a lot.


I'm a controls guy, not a quant guy but I've found the SDE perspective and this blog post to be incredible in helping me understand how a diffusion model works.


it makes intuitive sense, but at the same time, you can get good performance replacing the Gaussian noise with deterministic transformations.

so in the end diffusion can’t explain that much about how diffusion models work. disappointingly, so many modern ML things are like this.


Hey! I’m the post author. Highly recommend Sander Dieleman’s blog for alternative interpretations https://sander.ai/2023/07/20/perspectives.html

I personally find the SDEs the most intuitive, and the deterministic ODE / consistency models / rectified flow stuff as ideas that are easier to understand after the SDEs. But not everyone agrees!


Thanks for sharing this! I tend to agree that it’s easiest to understand this way.

I just find it a frustrating fact about modern machine learning that in fact, the nice SDE interpretation is somehow not the “truth”, it’s just a tool for understanding.


If by helping anyone you mean helping the math challenged (which is technically almost everyone) then I would be inclined to agree.

But to the quant crowd I’m guessing that couching diffusion in the language of stochastic calculus is helpful.


A much simpler exploration of this topic I always liked is the "Linear Diffusion" [0] example which implements a basic Diffusion model using only linear components. Given it's simplicity it gets surprisingly good results at generating digits.

0. https://www.countbayesie.com/blog/2023/4/21/linear-diffusion


Awesome tutorial, thanks!


Oof. I honestly can't tell if the author is joking at times. Gave up on reading it.


Without tears, right ? Seeing those mountains of cryptographic math I've got red eyes.


There's a teaching method called "handwriting without tears". The title is almost certainly a play on that.


The production of tears is left as an exercise to the reader. /s

Thanks for reading. The 2D simulation section might be more interesting on a first read — it makes the math less mysterious, I hope!


TL,DR: "Now time-reverse the stochastic differential equation to infer the rest of the fucking owl."

Thanks, guys, super helpful.


At least the article warns you upfront of the sort of mathematical sophistication required to get some of the explanations. The Author is a financial engineering sort, so their big thing is SDEs and assumes (for some of the explanations) that you bring that sort of intuition with you. If the Author was a signal processing type, they might use Kalman filter analogies, or pure statisticians would cite autocorrelation.

Don’t try to catch all the mathematical Pikachus in the paper, just choose the insights that resonate with you. Thankfully, there isn’t a pop quiz lurking at the end.

In honor of the Bay Area roots of HN, “believe it if you need it, if you don't, just pass it on”. I liked the paper even when skipping the SDE material.


I left the TENET references on the cutting room floor.

I too found it really surprising that the reverse-time equation has a simple closed form. Like, surely breaking a glass is easier than unbreaking it? That’s part of what got me interested in this stuff in the first place!

If you haven’t seen it yet, highly recommend the blogs of Sander Dieleman & Yang Song (who co-invented the SDE interpretation).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: