Generative AI

5 General parameters of LLM explained with examples

Large-scale language models (LLMS) provide several parameters that allow you to fine-tune their behavior and control how they generate responses. If the model does not produce what you need, the issue is how the parameters are configured. In this tutorial, we will explore some of the most commonly used – max_completion_tokens, temperature and cold, top_p, Presence_penaltyagain Frequency_penalty – And Understand how each one influences the outcome of the model.

Installing dependencies

pip install openai pandas matplotlib

Loading the Opelai API key

import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')

It starts the model

from openai import OpenAI
model="gpt-4.1"
client = OpenAI()

Tokens are max

Max Tokens is the maximum number of tokens the model can generate during runtime. The model will try to stay within this limit throughout the curve. If it exceeds the set number, the run will stop and be marked as complete.

A small value (such as 16) limits the model to very short answers, while a high value (such as 80) allows it to generate detailed and complete answers. Increasing this parameter gives the Model Room room to specify, define, or format its output naturally.

prompt = "What is the most popular French cheese?"
for tokens in [16, 30, 80]:
  print(f"n--- max_output_tokens = {tokens} ---")
  response = client.chat.completions.create(
    model=model,
    messages=[
      {"role": "developer", "content": "You are a helpful assistant."},
      {"role": "user", "content": prompt}
    ],
    max_completion_tokens=tokens
  )
  print(response.choices[0].message.content)

Temperature and cold

In large scale linguistic models (LLMS), the temperature parameter controls the diversity and randomness of the results produced. Low temperature values ​​make the model more deterministic and focused on possible responses – ideal for tasks that require precision and consistency. Higher values, on the other hand, introduce creativity and variety by allowing the model to explore less complex options. Technically, the temperature of the probability of the tokens predicted in the Softmax function: increases softens the distribution (variable results), while decreases sharpens the distribution (more hypothetical outputs).

In this code, we prompt the LLM to provide 10 different answers (n_choices = 10) for the same question – “What is another interesting place worth visiting? ” – Across the range of temperatures. By doing this, we can see how the diversity of responses changes with temperature. Low temperatures can produce similar or repeated responses, while high temperatures will show a wider and more geographical distribution.

prompt = "What is one intriguing place worth visiting? Give a single-word answer and think globally."

temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10 
results = {}

for temp in temperatures:
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=temp,
        n=n_choices
    )
    
    # Collect all n responses in a list
    results[temp] = [response.choices[i].message.content.strip() for i in range(n_choices)]

# Display results
for temp, responses in results.items():
    print(f"n--- temperature = {temp} ---")
    print(responses)

As we can see, as the temperature increases to 0.6, the responses vary greatly, moving beyond the single repeated response “Petra.” At temperatures above 1.5, the distribution shifts, and we can see similar responses Kyotoagain Machu Picchu like that.

Top P

Above P (also known as nucleus Sampling) is a parameter that controls how many tokens the model looks for based on the existing good shape. It is useful to focus the model on the most probable tokens, often improving the quality of convergence and output.

For the next observation, we first set the temperature value and apply P = 0,5 (50%), which means only the highest 50% of the 50% of the post mass. Note that when temperature = 0, the output is optical, so the maximum p has no effect.

The generation process works like this:

  • Enter the temperatures to adjust the odds of the tokens.
  • Use maximum P to store the most likely tokens that comprise 50% of the total total value.
  • He reset the remaining probabilities before sampling.

We will visualize how the dynamic token distribution changes the different heat values ​​of the query:
What other interesting place is worth visiting?

prompt = "What is one intriguing place worth visiting? Give a single-word answer and think globally."

temperatures = [0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5]
n_choices = 10 
results_ = {}

for temp in temperatures:
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=temp,
        n=n_choices,
        top_p=0.5
    )
    
    # Collect all n responses in a list
    results_[temp] = [response.choices[i].message.content.strip() for i in range(n_choices)]

# Display results
for temp, responses in results_.items():
    print(f"n--- temperature = {temp} ---")
    print(responses)

Since Petra responded with more than 50% absolute response probability, using a maximum of P = 0.5 filters out all alternatives. Because of this, the model only chooses “Petra” as the final result in all cases.

Frequency penalty

Frequency Peralty controls how much the model avoids repeating the same words or phrases in its output.

Range: -2 to 2

Default: 0

When the frequency penalty is high, the model is penalized for using words it has already used before. This encourages it to choose new and different words, making the text varied and repetitive.

In simple words – higher frequency penalty = less repetition and more creativity.

We'll explore this implementation quickly:

List 10 topics for a book of miracles. Give only titles and each title on a new line.

prompt = "List 10 possible titles for a fantasy book. Give the titles only and each title on a new line."
frequency_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
results = {}

for fp in frequency_penalties:
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        frequency_penalty=fp,
        temperature=0.2
    )

    text = response.choices[0].message.content
    items = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
    results[fp] = items

# Display results
for fp, items in results.items():
    print(f"n--- frequency_penalty = {fp} ---")
    print(items)
  • Low frequency penalties (-2 to 0): The titles tend to repeat, with familiar oath-like patterns such as “oath of shadow extension”, “crown of ember and ice”, and “heir of the last drangu” appearing frequently.
  • Proportional penalties (0.5 to 1.5): Some iterations remain, but the model is starting to generate diverse and creative topics.
  • Maximum penalty (2.0): The first three titles are still the same, but after that, the model produces different, unique, and imaginative book names (eg.

Existence of penalty

The penalty calculation controls how much the model avoids repeating words or phrases that already appear in the text.

  • Range: -2 to 2
  • Default: 0

The finer presence encourages the model to use a variety of different words, making the expression more varied and creative.

Unlike the frequency penalty, which accumulates with each repetition, the presence penalty is applied once to any word that appears, reducing the chance that it will be repeated in this output. This helps the model to produce the text with many variations and originals.

prompt = "List 10 possible titles for a fantasy book. Give the titles only and each title on a new line."
presence_penalties = [-2.0, -1.0, 0.0, 0.5, 1.0, 1.5, 2.0]
results = {}

for fp in frequency_penalties:
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        presence_penalty=fp,
        temperature=0.2
    )

    text = response.choices[0].message.content
    items = [line.strip("- ").strip() for line in text.split("n") if line.strip()]
    results[fp] = items

# Display results
for fp, items in results.items():
    print(f"n--- presence_penalties = {fp} ---")
    print(items)
  • A lower penalty depending on the penalty (-2.0 to 0.5): The titles are somewhat different, with repetitions of familiar sweet patterns such as Shadow Weath's oath “,” heir of the last dragon “,” crown of ember and snow “.
  • Medium penalty (1.0 to 1.5): A few popular titles remain, while the latest titles feature more creativity and unique combinations. Examples: “Ashes of a fallen kingdom”, “Secrets of the star forest”, “daughter of storm and stone”.
  • Maximum penalty (2.0): The top three topics remain the same, but some become more diverse and imaginative. Examples: “Moon fire and wire”

Look Full codes here. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.


I am a civil engineering student (2022) from Jamia Millia Islamia, New Delhi, and I am very interested in data science, especially neural networks and their application in various fields.

Follow Marktechpost: Add us as a favorite source on Google.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button