As a ML Vision researcher, I find these scaling hypothesis claims quite ridiculo...

andreyk · on Oct 7, 2022

I agree - I think scaling laws and scaling hypothesis are quite distinct personally. Scaling hypothesis is 'just go bigger with what we have and we'll get AGI', vs scaling laws are 'for these tasks and these models types, these are the empirical trends in performance we see'. I think scaling laws are still really valuable for vision research, but as you say we should not just abandon thinking about things beyond scaling even if we observe good scaling trends.

godelski · on Oct 7, 2022

Yeah I agree with this position. It is also what I see within my own research. But also in my own research I see the vast importance of architecture search. This may not be what the public sees, but I think it is well known to the research community or anyone with hands on experience with these types of models.

panabee · on Oct 7, 2022

this is well articulated. another key point: dall-e 2 uses 70% fewer parameters than dalle-e 1 while offering far higher quality.

from wikipedia (https://en.wikipedia.org/wiki/DALL-E):

DALL-E's model is a multimodal implementation of GPT-3 with 12 billion parameters which "swaps text for pixels", trained on text-image pairs from the Internet. DALL-E 2 uses 3.5 billion parameters, a smaller number than its predecessor.