Ask HN: Foundational Papers in AI

cushpush · on May 5, 2023

Asking my 3rd brain hemisphere, chatGPT, yielded the following very useful result:

LLM typically refers to "Large Language Models," which are a type of artificial intelligence model that uses deep learning techniques to analyze and understand natural language.

Here are some foundational papers in AI that may be helpful for understanding the development of LLMs:

"A Few Useful Things to Know About Machine Learning" by Pedro Domingos (2012) This paper provides an introduction to the fundamentals of machine learning, which is a crucial component of LLM development. "Natural Language Processing (almost) from Scratch" by Ronan Collobert and Jason Weston (2008)

This paper presents a deep learning architecture for natural language processing tasks, which laid the foundation for many subsequent LLM models. "Efficient Estimation of Word Representations in Vector Space" by Tomas Mikolov, et al. (2013)

This paper introduces the Word2Vec algorithm, which is a widely-used method for generating word embeddings that can be used in LLM models. "Attention Is All You Need" by Ashish Vaswani, et al. (2017) This paper presents the Transformer architecture, which is a type of neural network that has been used in many successful LLM models, including BERT and GPT.

"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin, et al. (2018) This paper describes the development of BERT, which is a pre-trained LLM that has achieved state-of-the-art performance on many natural language processing tasks.

"Language Models are Few-Shot Learners" by Tom B. Brown, et al. (2020) This paper presents the GPT-3 LLM, which is one of the largest and most powerful language models developed to date, and demonstrates its ability to perform well on a wide range of tasks with only a few examples.

rg111 · on May 5, 2023

I tried this, too. But have found that results are not too good, and omits really good papers or hallucinates the names of non-existing papers

p1esk · on May 8, 2023

GPT-4 or GPT-3.5?

tikkun · on May 5, 2023

Attention is all you need, and the first GPT paper. Also Gwern’s scaling hypothesis blog post

Though really the best thing is to use GPT-4 as a tutor. It has the 2021 knowledge cutoff but it’s excellent at knowing what’s relevant and explaining it.

See also https://news.ycombinator.com/item?id=35114530

https://news.ycombinator.com/item?id=35758634

https://news.ycombinator.com/item?id=35412805

Does the above cover what you’re looking for, or if not, could you give some more info about what you’re looking for?

r3trohack3r · on May 5, 2023

This is more than enough to get me started, I appreciate you. Thanks!

tikkun · on May 5, 2023

You are welcome

Also

https://news.ycombinator.com/item?id=35368176

https://news.ycombinator.com/item?id=35757802

https://news.ycombinator.com/item?id=35767789

https://www.parand.com/a-completely-non-technical-explanatio...

https://ai.stackexchange.com/a/39121

tikkun · on May 5, 2023

And https://e2eml.school/transformers.html (recommended below, also)

SOLAR_FIELDS · on May 5, 2023

https://github.com/Hannibal046/Awesome-LLM has a curated list of LLM specific resources.

Not the creator, just happened upon it when researching LLMs today.

eimrine · on May 5, 2023

bitcoin.org/bitcoin.pdf . Why this paper is in AI papers topic?

I believe that AI is gonna be unable to switch off in any other way except of switching off the entire Internet. This is one of the most impornant ability of ASI (and maybe AGI) The paper describes one of possible way of achieving unswithableness for silly people (I mean mostly the silly elite people, such as presidents of China, Russia and USA) who use to wave their nuclear dicks here and there showing the inability to rule the society in a wise way.

r3trohack3r · on May 5, 2023

If you’re going down that rabbit hole, I’d also include the paper on BitTorrent, DHT (Kadmellia) and Onion Routing.

eimrine · on May 5, 2023

I agree about BitTorrent point. I have read everything published by Bram Cohen (an author of BitTorrent) and the best read I can recommend is Shneidman's paper "Faithfulness in internet algorithms" [1] which has some deep reflections on the reasons which made such things as BitTorrent and cryptocurrencies possible. I can not say this paper is about AI but I claim that this is a foundational paper for the Internet age.

[1] https://dl.acm.org/doi/pdf/10.1145/1016527.1016537

HybridCurve · on May 6, 2023

This is a good list of papers with annotated code: https://nn.labml.ai/index.html

Utkarsh_Mood · on May 5, 2023

Hyena Hierarchy-https://arxiv.org/abs/2302.10866

p1esk · on May 8, 2023

Are you serious? How is this a foundational paper? It’s a mediocre “tweaks” paper which doesn’t even have impressive results.

mfalcon · on May 5, 2023

If I had to choose one I'd go with "Attention is all you need".

bilsbie · on May 5, 2023

I’m a smart(tm) guy and I couldn’t make heads or tails from that paper. That hand wavy diagram that’s supposed to explain the whole concept still makes me mad.

I’m finding this page is explaining it a lot better https://e2eml.school/transformers.html

HybridCurve · on May 5, 2023

There are a number of lectures on youtube for most, if not all, of the papers in this thread. You should check those out too, though I am at a loss to suggest any specific ones, as I have not found any favorites among them.