Still have to read the article. It's great to see people exploring this. From the first "language models are unsupervised multitask learners" type papers, i wish there had been more emphasis that the various behaviors these models have are essentially a side effect of learning some kind of self supervision task. A model has been trained to e.g. predict the next word given previous words, and we're happy to discover that it can be repurposed as a chatbot. And then people find the chatbot has some undesirable behaviors, and talk about fairness and governance and all that. When the basic point is the model was never really trained to do any of that, its just a word predictor. Why did you ever think it would be OK to just let it run wild on some other task?
All that to say, a big problem in AI/ML is models getting used for things they have no business being used for, and them people being at best underwhelmed, or harmed or offended by the results. The first step should be asking why is this model suitable for making the prediction I'm asking it to, and I think closer scrutiny on what these "foundation models" actually do is a good direction.
> AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character.
But Clip cloud be a good plug-in for nowadays writings/design then, something like Clip empowered unsplash.
Could the vision system not be the AI foundational model? Car vision, specifically Tesla state-of-the-art is not mentioned. I see bit of NLP bias.