A Major Language Model Study. How to Become an LLM Scientist or… | by Maxime Labonne | January, 2025

How to become an LLM Scientist and Engineer from scratch

The Major Language Model (LLM) course is a collection of topics and educational resources for entry into LLMs. It consists of two main roads:
- 🧑 🔬 LLM scientist focuses on building the best LLMs using the latest techniques.
- 👷 LLM Engineer focuses on creating LLM-based applications and deploying them.
For an interactive version of this course, I have created an LLM assistant to answer questions and test your knowledge in a personalized way at HuggingChat (recommended) or ChatGPT.
This section of the course focuses on learning how to build the best LLMs using the latest techniques.
Deep knowledge of Transformer architecture is not required, but it is important to understand the main steps of today's LLMs: converting text into numbers by using tokens, processing these tokens by using layers that combine attention methods, and finally generate new text by using various sampling techniques. .
- Architectural Overview: Understand the evolution from encoder-decoder Transformers to decoder-only designs such as GPT, which forms the basis of modern LLMs. Focus on how these models are processed and produce text at a high level.
- Making tokens: Learn the principles of tokenization — how text is converted into numerical representations that can be processed by LLMs. Explore different tokenization techniques and their impact on model performance and output quality.
- Methods of attention: Master the core concepts of mindfulness techniques, especially mindfulness and its variants. Understand how these methods allow LLMs to process long-range dependencies and maintain context across sequences.
- Sampling techniques: Explore various text production methods and their trade-offs. Compare deterministic methods such as selfish search and beam search with probabilistic methods such as temperature sampling and nucleus sampling.
📚 References:
- A visual introduction to Transformers by 3Blue1Brown: A visual introduction to Transformers for complete beginners.
- LLM Views by Brendan Bycroft: An interactive 3D visualization of LLM interiors.
- Andrej Karpathy's nanoGPT: 2h long YouTube video to restart GPT from scratch (for programmers). He also made a video about making tokens.
- Attention? Be careful! by Lilian Weng: A historical overview to introduce the need for attention methods.
- Decoding Techniques in LLMs by Maxime Labonne: Provide coding and a visual introduction to different decoding techniques to generate text.