Every day, I use a Large Language Model (LLM), mostly either ChatGPT or Claude Sonnet in Visual Studio Code, and I’m stunned at this amazing technology. Since my background is in classic information retrieval, I have a burning desire to understand how this all works. To that end, I’ve found these resources helpful. I’m very interested in the areas of emergent behavior and consciousness too.
Books
- I’ve learned more from Sebastian Raschka’s book than any other resource Build a Large Language Model (From Scratch). The book uses Python and Pytorch and covers everything from tokenizing to instruction fine-tuning.
- Sebastian has free YouTube videos that complement the book. There is a short form Build a Large Language Model (From Scratch)- short playlist and a long form Build a large Language Model (from Scratch)- big playlist.
- Sebastian has another book in progress Build a Reasoning Model (From Scratch).
- Here is Sebastian’s website and his Substack.
Papers
ChatGPT gave me a suggested paper reading list, in chronological order, corresponding to the key milestones that let to today’s ChatGPT capability. As I work through these, I’ll post the links to the papers with some brief commentary.
Year | Milestone | Key Paper(s) | Core Contribution |
---|---|---|---|
2013–14 | Word Embeddings | Mikolov, Pennington | Continuous vector meaning |
2014–16 | Seq2Seq + Attention | Sutskever, Bahdanau | Neural translation |
2017 | Transformer | Vaswani et al. | Self-attention architecture |
2018 | Pretraining | Radford, Devlin | General-purpose NLP models |
2019 | GPT-2 | Radford et al. | Scaling and coherence |
2020 | GPT-3 + RLHF | Brown, Christiano | Few-shot learning & alignment |
2022 | ChatGPT (InstructGPT) | Ouyang et al. | Conversational fine-tuning |
2023–25 | GPT-4 & beyond | OpenAI, Wei, Anthropic | Reasoning & multimodality |