Generative AI

7 llm parameters – What are they doing and how to use it?

Out of Tuning llm mainly is a filtering problem: Marking the following model toll model with a few syringes of sample-Max tokens (CAPS Length to respond under model content), temperature (Login SCARING in most / less than random), top-p/nucleus including Top-k (Remove a pile-based person or position), frequency including The penalties of existing (Disable repetition or promote close), and Stop sequence (Hard completion in dancers). These seven parameters communicate: The temperature increases the top tail of P / Top-k and open it; The penalties reduces the power of long generations; Stop Plus Max Tokens provides decisive borders. The lower parts described each parameter with accuracy and summarizing the written account of the sellers and domestic conduct included in decorative documents.

1) Max tokens (aka max_tokens, max_output_tokens, max_new_tokensSelected

What is: A strong steady imprisonment that how many models can you produce for this response. Does not increase the content window; This page figure of the installation tokens and issuing the tokens must be able to fit the model model. If the limit is first, API marks the “incomplete / duration.”

When:

  • Press latency and costs (tokens ≈ time and $$).
  • Protect Overruns Passing Deliverer When You Can Trust only stop.

2) The temperature (temperatureSelected

What is: Scales used for logs before softmax:

softmax (z / t) I = Σz / Tezi / T

Lower T Allow distribution (additional determination); suce T tend to (random random). Regional APIS has shown the nearest distance [0,2][0, 2][0,2]. Work t For analyzing activities and up t the old extension.

3) sample of nucleus (top_pSelected

What is: Sample only from the unqualified a set of tokens of its own partners ≥ p. This reduces long lower lower tail that emits the “smoking, repetition). Presented as Sumpling of nucleus by Holtzman et al. (2019).

Active notes:

  • A normal performance band of open text top_p ≈ 0.9–0.95 (Kissing for facial guidance).
  • Anthropic advises the setting or temperature or top_pnot both, to avoid the compounded random.

4) upper kample of K (top_kSelected

What is: Each step, limits to candidates in the P The highest tokens, renamed, and sample. Fan, Lewis, Dauphin, 2018) used this to improve the finger vs. In today's tools are commonly mixed with heat or nucleus fields.

Active notes:

  • Typical top_k small distances (≈5-50) with balanced variance; HF documents indicate this as a “PRO-Tip” guidance.
  • Both of you top_k including top_p Set, many libraries are working K-Filting Sorting and Sorting (Startup information, but useful to know).

5) A frequency fee (frequency_penaltySelected

What is: Reduce the likelihood of equal tokens that they have already appeared in the context produced, reducing VERBATIM repetition. The Azure Reference / Unlock specify width -2.0 to +2.0 and describes the result accurate. Good prices reduce repetition; The negative extremities of encouraging.

When: Long generations in which model comes in or echoes sentences (eg letters, poems, code ideas).

6) penalty on the face (presence_penaltySelected

What is: Emotional tokens appear at least once So far, it promotes the model to introduce new tokens / topics. The same written distance -2.0 to +2.0 in Azure / Unlock Reference. Good prices press in a new bin; Bad amounts around the topics.

Tuning Heuistic: Start in 0; tende Presence_penalty Top if the model is always “in railroad” and will not check out alternatives.

7) Stop sequence (stop, stop_sequencesSelected

What is: The cables forces decoder to stop it right away from, without removing the stop text. Useful by Customizing a formal results (eg the only thing of JSON or class). Many APIs allow many stops.

Sticks for inventory: Pick incomprehensible The expected delegates to do in the general text (eg. "<|end|>", "nn###"), and complying max_tokens such as the control of the belts and resers.

Important partnerships

  • Heat vs. Nucleus / Top-K: To increase temperature increasing the potential weight in the dirt; top_p/top_k applicable crop that tail. Many providers recommend correction one The control of random during the storage space search space.
  • Physical Control: Emptirally, nucleus sampling offer repetition and repetition by reducing unreliable tails; Include a light frequent frequency of long effects.
  • Latency / Cost: max_tokens the exact correct lever; Streaming feedback does not change the cost but improves less recognition. (
  • The model difference: Other “Reasoning” ends or ignores these penalties (temperature, penalties, etc.). Check certain certain documents before entering the settings.

References:


Michal Sutter is a Master of Science for Science in Data Science from the University of Padova. On the basis of a solid mathematical, machine-study, and data engineering, Excerels in transforming complex information from effective access.

Follow MarkteachPost: We have added like a favorite source to Google.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button