Can you explain what the big deal is? I’m still in the early learning stages.

6gvONxR4sf7o · on May 20, 2022

As an example, if you want to encode all of the data in wikipedia with embeddings and train a model to answer questions with that information, historically, that would mean a model that encodes all of wikipedia, encodes the question, uses all of encoded wikipedia to decode an answer, then does backprop through all of that and updates the weights. Then it re-encodes all of wikipedia with the new weights and goes all over again, again and again at each training step, also somehow holding all of that in GPU memory. Meaning you basically couldn’t do it that way.

Today, we’re seeing big models that can encode all of wikipedia in useful ways. If the encodings are “good enough” then you can encode all of wikipedia once, before training another model that just has to encode a question, then use encoded wikipedia to decode an answer, then do backprop through just the answer and question. If wikipedia changes in the meantime, you can probably just update your database of encoded stuff and your learned QA model will be able to incorporate that new information.

amelius · on May 20, 2022

Replace Wikipedia by the internet, and you can replace Google Search by some (hopefully) soon to be discovered algorithm based on these principles. Exciting times.