Exploring LLMs


A Study Guide

Every day, I use a Large Language Model (LLM), mostly either ChatGPT or Claude Sonnet in Visual Studio Code, and I’m stunned at this amazing technology. Since my background is in classic information retrieval, I have a burning desire to understand how this all works. To that end, I’ve found these resources helpful. I’m very interested in the areas of emergent behavior and consciousness too.

Books

Papers

ChatGPT gave me a suggested paper reading list, in chronological order, corresponding to the key milestones that let to today’s ChatGPT capability. As I work through these, I’ll post the links to the papers with some brief commentary.

Year Milestone Key Paper(s) Core Contribution
2013–14 Word Embeddings Mikolov, Pennington Continuous vector meaning
2014–16 Seq2Seq + Attention Sutskever, Bahdanau Neural translation
2017 Transformer Vaswani et al. Self-attention architecture
2018 Pretraining Radford, Devlin General-purpose NLP models
2019 GPT-2 Radford et al. Scaling and coherence
2020 GPT-3 + RLHF Brown, Christiano Few-shot learning & alignment
2022 ChatGPT (InstructGPT) Ouyang et al. Conversational fine-tuning
2023–25 GPT-4 & beyond OpenAI, Wei, Anthropic Reasoning & multimodality

Word Embeddings & Distributed Representations (2013-2014)

  • Mikolov et al. (2013). “Efficient Estimation of Word Representations in Vector Space.” Paper
  • Pennington et al. (2014). “GloVe: Global Vectors for Word Representation.” Paper