For someone wanting to ramp up on what’s going on in AI right now, under the hood building these models, what are some good resources? What are your favorite reads?
Asking my 3rd brain hemisphere, chatGPT, yielded the following very useful result:
LLM typically refers to "Large Language Models," which are a type of artificial intelligence model that uses deep learning techniques to analyze and understand natural language.
Here are some foundational papers in AI that may be helpful for understanding the development of LLMs:
"A Few Useful Things to Know About Machine Learning" by Pedro Domingos (2012)
This paper provides an introduction to the fundamentals of machine learning, which is a crucial component of LLM development.
"Natural Language Processing (almost) from Scratch" by Ronan Collobert and Jason Weston (2008)
This paper presents a deep learning architecture for natural language processing tasks, which laid the foundation for many subsequent LLM models.
"Efficient Estimation of Word Representations in Vector Space" by Tomas Mikolov, et al. (2013)
This paper introduces the Word2Vec algorithm, which is a widely-used method for generating word embeddings that can be used in LLM models.
"Attention Is All You Need" by Ashish Vaswani, et al. (2017)
This paper presents the Transformer architecture, which is a type of neural network that has been used in many successful LLM models, including BERT and GPT.
"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin, et al. (2018)
This paper describes the development of BERT, which is a pre-trained LLM that has achieved state-of-the-art performance on many natural language processing tasks.
"Language Models are Few-Shot Learners" by Tom B. Brown, et al. (2020)
This paper presents the GPT-3 LLM, which is one of the largest and most powerful language models developed to date, and demonstrates its ability to perform well on a wide range of tasks with only a few examples.
Attention is all you need, and the first GPT paper. Also Gwern’s scaling hypothesis blog post
Though really the best thing is to use GPT-4 as a tutor. It has the 2021 knowledge cutoff but it’s excellent at knowing what’s relevant and explaining it.
bitcoin.org/bitcoin.pdf . Why this paper is in AI papers topic?
I believe that AI is gonna be unable to switch off in any other way except of switching off the entire Internet. This is one of the most impornant ability of ASI (and maybe AGI) The paper describes one of possible way of achieving unswithableness for silly people (I mean mostly the silly elite people, such as presidents of China, Russia and USA) who use to wave their nuclear dicks here and there showing the inability to rule the society in a wise way.
I agree about BitTorrent point. I have read everything published by Bram Cohen (an author of BitTorrent) and the best read I can recommend is Shneidman's paper "Faithfulness in internet algorithms" [1] which has some deep reflections on the reasons which made such things as BitTorrent and cryptocurrencies possible. I can not say this paper is about AI but I claim that this is a foundational paper for the Internet age.
I’m a smart(tm) guy and I couldn’t make heads or tails from that paper. That hand wavy diagram that’s supposed to explain the whole concept still makes me mad.
There are a number of lectures on youtube for most, if not all, of the papers in this thread. You should check those out too, though I am at a loss to suggest any specific ones, as I have not found any favorites among them.
LLM typically refers to "Large Language Models," which are a type of artificial intelligence model that uses deep learning techniques to analyze and understand natural language.
Here are some foundational papers in AI that may be helpful for understanding the development of LLMs:
"A Few Useful Things to Know About Machine Learning" by Pedro Domingos (2012) This paper provides an introduction to the fundamentals of machine learning, which is a crucial component of LLM development. "Natural Language Processing (almost) from Scratch" by Ronan Collobert and Jason Weston (2008)
This paper presents a deep learning architecture for natural language processing tasks, which laid the foundation for many subsequent LLM models. "Efficient Estimation of Word Representations in Vector Space" by Tomas Mikolov, et al. (2013)
This paper introduces the Word2Vec algorithm, which is a widely-used method for generating word embeddings that can be used in LLM models. "Attention Is All You Need" by Ashish Vaswani, et al. (2017) This paper presents the Transformer architecture, which is a type of neural network that has been used in many successful LLM models, including BERT and GPT.
"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin, et al. (2018) This paper describes the development of BERT, which is a pre-trained LLM that has achieved state-of-the-art performance on many natural language processing tasks.
"Language Models are Few-Shot Learners" by Tom B. Brown, et al. (2020) This paper presents the GPT-3 LLM, which is one of the largest and most powerful language models developed to date, and demonstrates its ability to perform well on a wide range of tasks with only a few examples.