Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Foundational Papers in AI
31 points by r3trohack3r on May 5, 2023 | hide | past | favorite | 17 comments
Hey HN,

For someone wanting to ramp up on what’s going on in AI right now, under the hood building these models, what are some good resources? What are your favorite reads?

Asking my 3rd brain hemisphere, chatGPT, yielded the following very useful result:

LLM typically refers to "Large Language Models," which are a type of artificial intelligence model that uses deep learning techniques to analyze and understand natural language.

Here are some foundational papers in AI that may be helpful for understanding the development of LLMs:

"A Few Useful Things to Know About Machine Learning" by Pedro Domingos (2012) This paper provides an introduction to the fundamentals of machine learning, which is a crucial component of LLM development. "Natural Language Processing (almost) from Scratch" by Ronan Collobert and Jason Weston (2008)

This paper presents a deep learning architecture for natural language processing tasks, which laid the foundation for many subsequent LLM models. "Efficient Estimation of Word Representations in Vector Space" by Tomas Mikolov, et al. (2013)

This paper introduces the Word2Vec algorithm, which is a widely-used method for generating word embeddings that can be used in LLM models. "Attention Is All You Need" by Ashish Vaswani, et al. (2017) This paper presents the Transformer architecture, which is a type of neural network that has been used in many successful LLM models, including BERT and GPT.

"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Jacob Devlin, et al. (2018) This paper describes the development of BERT, which is a pre-trained LLM that has achieved state-of-the-art performance on many natural language processing tasks.

"Language Models are Few-Shot Learners" by Tom B. Brown, et al. (2020) This paper presents the GPT-3 LLM, which is one of the largest and most powerful language models developed to date, and demonstrates its ability to perform well on a wide range of tasks with only a few examples.

I tried this, too. But have found that results are not too good, and omits really good papers or hallucinates the names of non-existing papers

GPT-4 or GPT-3.5?

Attention is all you need, and the first GPT paper. Also Gwern’s scaling hypothesis blog post

Though really the best thing is to use GPT-4 as a tutor. It has the 2021 knowledge cutoff but it’s excellent at knowing what’s relevant and explaining it.

See also https://news.ycombinator.com/item?id=35114530



Does the above cover what you’re looking for, or if not, could you give some more info about what you’re looking for?

This is more than enough to get me started, I appreciate you. Thanks!

And https://e2eml.school/transformers.html (recommended below, also)

https://github.com/Hannibal046/Awesome-LLM has a curated list of LLM specific resources.

Not the creator, just happened upon it when researching LLMs today.

bitcoin.org/bitcoin.pdf . Why this paper is in AI papers topic?

I believe that AI is gonna be unable to switch off in any other way except of switching off the entire Internet. This is one of the most impornant ability of ASI (and maybe AGI) The paper describes one of possible way of achieving unswithableness for silly people (I mean mostly the silly elite people, such as presidents of China, Russia and USA) who use to wave their nuclear dicks here and there showing the inability to rule the society in a wise way.

If you’re going down that rabbit hole, I’d also include the paper on BitTorrent, DHT (Kadmellia) and Onion Routing.

I agree about BitTorrent point. I have read everything published by Bram Cohen (an author of BitTorrent) and the best read I can recommend is Shneidman's paper "Faithfulness in internet algorithms" [1] which has some deep reflections on the reasons which made such things as BitTorrent and cryptocurrencies possible. I can not say this paper is about AI but I claim that this is a foundational paper for the Internet age.

[1] https://dl.acm.org/doi/pdf/10.1145/1016527.1016537

This is a good list of papers with annotated code: https://nn.labml.ai/index.html

Are you serious? How is this a foundational paper? It’s a mediocre “tweaks” paper which doesn’t even have impressive results.

If I had to choose one I'd go with "Attention is all you need".

I’m a smart(tm) guy and I couldn’t make heads or tails from that paper. That hand wavy diagram that’s supposed to explain the whole concept still makes me mad.

I’m finding this page is explaining it a lot better https://e2eml.school/transformers.html

There are a number of lectures on youtube for most, if not all, of the papers in this thread. You should check those out too, though I am at a loss to suggest any specific ones, as I have not found any favorites among them.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact