Hacker News new | past | comments | ask | show | jobs | submit login
Matryoshka Diffusion Models (arxiv.org)
71 points by drcwpl 7 months ago | hide | past | favorite | 43 comments



Can someone explain to a laymen what a multi-resolution diffusion model is used for or why it’s better?

I don’t follow ML so while I understand the words I don’t have enough context to know why this is good.


This paper is about training the diffusion models on progressively higher resolutions, spending comparatively more iterations on smaller resolutions (which use much less time and memory per iteration). Their method is flexible enough for any multidimensional problem which is amenable to similar scaling (the use the example of generating video, which includes the additional dimension of time).

The models converge much more quickly during training by using this multi-resolution paradigm. There's a figure (4) in their paper showing how quickly these models converge to a good accuracy on a standard training set at 256x256 resolution, comparing them in particular to a Latent Diffusion implementation.

They showed that their baseline models were slightly better than Latent diffusion in terms of training efficiency. However, when they pretrained their models on 64x64, they converged much much faster. The Apple models converged at around 60k iterations to a better score than where Latent Diffusion arrived at in around 300k iterations.

The caveat is that for my example above, they pretrained the model for 390k iterations on 64x64 before setting it to train at 256x256, which is indeed more than Latent diffusion got the opportunity to train for - however, those 390k iterations were using something proportional to 16 times less resources.

This would be a very good paper for our friends creating Stable Diffusion to copy when creating the next versions, it would likely save them a lot of GPU time and I bet it would improve robustness for different dimension output.


You can generate low-res image faster / first and selectively generate hi-res ones for longer time.


Thanks. I was trying to think of something more general, not image generation, which might be why I was having trouble.


I certainly echo the sentiment that Apple is doing well enough with on device AI/ML features.

* Photo enhancement and object detection

* Text identification

* FaceID

But I also echo the sentiment that they could and should be doing more.

And I only need one example to highlight their massive shortcomings...

Siri.


It seems even more suprising when you realize that the then almost magical Siri was released 12-13 years ago! What a wasted head start...


But it’s on device. Not sending your voice to the cloud.


Because they don't gather as much data as the other tech giants


Like OpenAI?


Where does Siri fall short for you? It's not perfect, but it's never felt any worse than my google home


“Siri, what’s 9am Pacific Time?”

> gives me the current date.

“Siri, what’s 9am Pacific Time in my local time zone?”

> gives me the current time.

I could go on…


Someone else a while back commented he said he told Siri to “turn on the living room light” and it did so at his mom’s house instead.

Or the classic where you ask something simple and it looks the query up on the internet verbatim instead. Or completely bungles a clearly enunciated query.

“Hey Siri, what’s 3 * 9?” “Looking up ‘treehouse sign’ on the internet”


Try asking about cities rather than time zones.


I don't want to try to understand the AI, when the promise was that the AI will be able to understand me.


The Promise: "Cars will get you from point A to B at 100kmph"

"I got in a car and said go. It didn't go."

"Did you try putting it into gear?"

"That wasn't the promise! I already understand my horse and it goes fast enough when I say go. Why should I now have to understand a car?"

Tools are tools! AI is not different. Like any tool, it will have limits, it will get better, it will keep having other limits.

Learn the new tool if you want to, don't learn the new tool if you don't want to. But it feels disingenuous, and your valid criticisms get lost, when you claim "The tool doesn't do what I want! and stop trying to teach me how to use it!".


I'd agree with you if the whole purpose of AI wasn't that you don't need to learn anything new, instead you can talk to it like you talk to people.

If a person would understand what I mean but an AI doesn't, the AI is broken.


I’ll speak for myself, but I most certainly need to learn how to understand and be understood by other people. So that is what I expect from AI.


I'll make the point even clearer: The promise of AI assistants is that, given that you already know how to make yourself understood by people, a skill the vast majority of the population already possesses, the AI assistant does not require any additional learning.


Text to image and text to video


Apple can’t do AI/machine learning if their lives depended on it. This is a friendly reminder that it took Apple until the year 2023 to have a functioning autocorrect on the iPhone keyboard.


A lot of Apple's ML features are fantastic.

- I can search my photos on my phone, despite not having uploaded them to the cloud (image recognition runs locally)

- FaceID

- Excellent OCR in pretty much everything

They've been cautious about shipping generative AI features, but they're absolutely leading the field in terms of building great features using edge ML running on devices.


Probably one of the biggest quality of life improvements for me in iOS 17 has been that you can now now designate a list as a 'Shopping Lits' and it will automatically sort items into categories - so Apples Lettuce, etc go under a Fruit and Veg heading, likewise drinks, dairy, cleaning products.

It's really very good and proibably saves me 5 minutes on a big shop. Very understated, non-flashy ML


How do you make this designation?


1. Open Reminders.app

2. Add List

3. Set "List Type" to Grocery


This was the trick for me. I’ve had a “Shopping” list for years, but it wasn’t doing the magic.

I googled it the other day and setting the type was the key.

Awesome.


List type = 'Shopping' if you are set to UK English :)


They do, they just shy away from calling it "AI." Their camera is one example.

https://thinkml.ai/iphone-ai-artificial-intelligence-feature...


The image text detection in Preview/Files has been working pretty good for me.


It's a super nice feature to have system wide on iOS and macOS. Its presence in Safari lets the built in translate feature even translate text in images which is particularly useful for Japanese websites where it's common for half the site's text to be in images.


Apple just doesn’t care much about server side AI.

But Apple and Google are leading the cutting edge for on-device AI.


What?! Is it safe to enable all autocorrect features on the iPhone?

I had to turn off half of them to stop it writing random stuff for me


in my experience SwiftKey has much better autocorrection but YMMV


SwiftKey is the best I've used, but even it has some annoying bugs that means it learns wrong words and will routinely prefer them over the correct ones. For example, I can't for the life of me get it to prefer "my" over "NY".


Recent major iOS update improved situation significantly imho.


Hehe.


You chuckle, but knowing when it's safe to turn autocorrect back on will be the most important news that this hacker gets from this website this year.


I always found the criticisms about Apple's autocorrect to be severely overblown. Considering pieces of that were necessary for a virtual keyboard the size of the original iPhone to even be viable in the first place.

Yeah it did some annoying things like correct fuck to duck... but to call it non functioning is just flat out wrong. More often than it it did what I needed it to do.

Also "friendly reminder" maybe we should ask ourselves why a certain Ad company has so much data to improve their AI at faster paces, and it isn't because they have better engineers.


I couldn't disagree more. The number of times it corrects not the word I just typed, but instead words prior to the word I typed.. head explodes.. I've reached the point where I have to proof read texts after they've been written in their entirety because Apple seems to play swap-a-roo at will. I just don't get it. If I hit the spacebar and moved on to the next word then that's the sole word I want autocorrect to be worrying about.


This is the NEW iOS. Before updating to iOS 17, this did not happen to me. With the new 'transformer' autocorrect, it does. Supposedly it is context aware and that is why it is correcting past words as you keep typing.


It was already swap-a-roo’ing since iOS 16, maybe even 15.

It also somehow misses simple corrections. Try it right now, it doesn’t know that “ennencuiate” should correct to “enunciate”. Or even “dejline” to “decline”.

What’s worst about the swap—a-roo’ing is that it will replace a common logical word in a sentence with something seldom-used.

Like, it’d correct “overwhelmingly just not how it works” to “ontologically just not how it works”.


I guess Apple needs a bit more than attention to have a good autocorrect model.


I’ve been using iOS since 2007 (first iPod touch), in three different languages, and have never had anything similar. Do you have an example of words it autocorrects wrongly for you?


I know it's well intentioned, but in 90% of cases it's just wrong and destructive. I'll correct the word I just typed because I anticipate it being potentially wrong. But somebody changing something from 3 words ago? What the hell?

I can't even turn it off because I use the swipe input.

It's so funny that it'll use new, wrongly guessed information to change previous context rather than using that context to correctly guess the new information.

That being said, I think they toned it down in one of the recent updates. I remember being infuriated much less by it than a year ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: