AI interview series # 1: Describe some LLM text generation techniques used in llms

Every time you do an LLM, it doesn't generate a complete answer all at once – it creates the answer one word (or token) at a time. At each step, the model predicts the probability of the next possible token based on all entries so far. But knowing the end probabilities is not enough – the model also needs a strategy to decide which token to choose next.
Different techniques can completely change how the final output looks – some make it more focused and direct, while others make it more creative or separate. In this article, we will explore Four text entry strategies are used in LLMS: Greedy Searching, Beam Search, Nucleus Samplingagain Temperature Sampling – It explains how each person works.
Greedy Searching
Greedy Search is a very simple decorative scheme where, at each step, the model selects the token with the highest probability given the current context. While it's quick and easy to use, it doesn't always produce the most coherent or logical sequence – like making good local choices without thinking about the overall outcome. Because it only follows one path in the probability tree, it can miss better sequences that require shorter trades. Because of this, greedy searches often lead to repetitive, generic, or garbled text, making it unsuitable for open source text processing.
Beam Search
Beam Search is an improved decorative technique over greedy search that stores a sequence of possible sequences (called beams) in each generation instead of one. It increases the maximum number of K sequences, which allows the model to evaluate the most promising paths in finding a feasible tree and may find high-quality completions that greedy algorithms might miss. The parameter K (Beam Found) controls the trade-off between quality and composition – larger beams produce better but slower text.
While Beam search works well for structured tasks like machine translation, where accuracy is more important than composition, it tends to produce text that is repetitive, predictable, and less than ideal for generation. This happens because the algorithm favors continuous high probability, which leads to less separation and “neural text decay,” when the model eliminates certain words or phrases.

Greedy Search:


Beam Search:


- Greed (K = 1) It always takes high local opportunities:
- T2: selects “Slightly” (0.6) over “Fast” (0.4).
- Results Method: “Slow dog.” (Last chance: 0.1680We are divided
- Beam Search (K = 2) it keeps both “Slightly” and “it's fast” methods are alive:
- In T3, it sees a path that starts with “Fast” has a higher chance of a good ending.
- Results Method: “A quick cat purrs.” (Last chance: 0.1800We are divided
Beam search effectively evaluates the path that had the lowest probability early on, resulting in better search results.
Top-P Sampling (nucleus Sampling) is a decorative decodilistic scheme that changes how many tokens are considered for each generation. Instead of choosing from a fixed number of top tokens such as a top-k sample, top-p sampling chooses the smallest set of tokens whose cumulative probability adds to the threshold p (for example, 0.7). These tokens form the “nucleus,” from which the next token is pushed back after hurting their chances.
This allows the model to measure the diversity and consistency of the sample from a wide range where many tokens have the same tokens (flat distribution) and reduce the sharpest tokens when the distribution is sharp (Peaky). Because of this, high-p sampling produces a more natural, diverse, and conceptually relevant text compared to systematic size-based methods such as greedy or clustering.


Temperature Sampling
Temperature Sampling controls the level of randomness in text generation by adjusting the temperature parameter
High temperatures (T> 1) reduced the distribution, introducing more randomness and diversity but due to the convergence of the interaction. In fact, the temperature sample allows a good trade-off between creativity and precision: lower temperatures yield decisive, visual results, while higher ones produce more varied and imaginative text.
The right temperature often depends on the task – for example, creative writing benefits from high values, while technical or factual answers do better with low ones.



I am a civil engineering student (2022) from Jamia Millia Islamia, New Delhi, and I am very interested in data science, especially neural networks and their application in various fields.
Follow Marktechpost: Add us as a favorite source on Google.



