Every day, I use a Large Language Model (LLM), mostly either ChatGPT or Claude Sonnet in Visual Studio Code, and I’m stunned at this amazing technology. Since my background is in classic information retrieval, I have a burning desire to understand how this all works. To that end, I’ve found these resources helpful. I’m very interested in the areas of emergent behavior and consciousness too.

Books

I’ve learned more from Sebastian Raschka’s book than any other resource Build a Large Language Model (From Scratch). The book uses Python and Pytorch and covers everything from tokenizing to instruction fine-tuning.
Sebastian has free YouTube videos that complement the book. There is a short form Build a Large Language Model (From Scratch)- short playlist and a long form Build a large Language Model (from Scratch)- big playlist.
Sebastian has another book in progress Build a Reasoning Model (From Scratch).
Here is Sebastian’s website and his Substack.

Papers

ChatGPT gave me a suggested paper reading list, in chronological order, corresponding to the key milestones that let to today’s ChatGPT capability. As I work through these, I’ll post the links to the papers with some brief commentary.

Year	Milestone	Key Paper(s)	Core Contribution
2013–14	Word Embeddings	Mikolov, Pennington	Continuous vector meaning
2014–16	Seq2Seq + Attention	Sutskever, Bahdanau	Neural translation
2017	Transformer	Vaswani et al.	Self-attention architecture
2018	Pretraining	Radford, Devlin	General-purpose NLP models
2019	GPT-2	Radford et al.	Scaling and coherence
2020	GPT-3 + RLHF	Brown, Christiano	Few-shot learning & alignment
2022	ChatGPT (InstructGPT)	Ouyang et al.	Conversational fine-tuning
2023–25	GPT-4 & beyond	OpenAI, Wei, Anthropic	Reasoning & multimodality

Word Embeddings & Distributed Representations (2013-2014)

Mikolov et al. (2013). “Efficient Estimation of Word Representations in Vector Space.” Paper
Pennington et al. (2014). “GloVe: Global Vectors for Word Representation.” Paper