Roullolly is used to make good use of llm

The Raokm is a variable framework for working and testing of the llm router, designed to increase performance while reducing costs.
Important features:
- SEAMLESS integration – Actions as replacement of Openai client or active client, in accordance with the most convenient questions in cheap models.
- The front trainers in the box – proven to cut the cost of 85% while keeping 95% of the operation of the MT-Bench bench.
- Informalization of calling – Similar to do leading commercial activities while it is a fewer than 40% cheaper.
- Performed and customized – easily enter new router, good tune breads, and compare workouts to all many benches.
In this lesson, we will go by the way of making:
- Upload and use a previously trained router.
- It measured your charges of use.
- Examine the behavior in the love of the different types of lift.
- Look Full codes here.
To include leaning
!pip install "routellm[serve,eval]"
Loading the Opelai API key
Finding the Opelai API key, visit and generate a new key. If you are a new user, you may need to add payment information and make a minimum payment of $ 5 to activate API access.
RaoutLolm Feelolages Lidellm supports the elimination of the chat from a broad range of both open types and source models. You can check the list of suppliers where you want to use another model. Look Full codes here.
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
To download the installation file
The RAULLM uses the configuration file to find the prescribed checkpoint and the information they have been trained. This file tells the program to find models that determine whether to send a question to a strong or weak model. Look Full codes here.
Do I need to edit it?
For many users – no. The default repairs already points to well-trained rooters (MF, Bert, Causal_llm) working outside the box. You need to turn you only if you plan to:
- Train your router on custom data.
- Replace the algorithm of a complete route with a new one.
In this lesson, we will keep a convig as it is and is simply for:
- Set our strong and weaker names to code.
- Enter our API keys for selected providers.
- Use an estimated limit to estimate costs and quality.
- Look Full codes here.
!wget
Roullollm's control
In this block block, we import the required libraries and start the Roullolly controller, which will affect how motivation is running between models. We specify routerers =[“mf”] To use the matrix factorization router, the model of the decision made as if predicts whether the question should be sent to a strong or weakest model.
The strong _model parameter is set to “GPT-5”, a high-quality but most expensive model, when the weak parameter is set to “O4-mini”, another fast and cheap. Each of the coming, router tests its hardship against the roots and automatically selects the most expensive option – to ensure that the simpler tasks can be found in a powerful model.
This configuration allows you to estimate the cost of the cost and quality of the response without subscription. Look Full codes here.
import os
import pandas as pd
from routellm.controller import Controller
client = Controller(
routers=["mf"], # Model Fusion router
strong_model="gpt-5",
weak_model="o4-mini"
)
!python -m routellm.calibrate_threshold --routers mf --strong-model-pct 0.1 --config config.example.yaml
This command is conducting a process of measuring the Tarrix Factorization (MF) Router. The Strong-Model-PCT 0.1 conflict tells the program to find the amount threshold that is about 10% of the questions in a strong model (and others in a weak model).
Using The -Config Config.ex Sample.Yaml model file and router Settings, determined measurement:
For 10% solid sources of model with MF, perfect limit is 0.24034.
This means that any of the difficulty question above above 0.24034 will be sent to a strong model, while those below will go to a weak model, synchronizing the trades of your quality.
To define the limit and ongoing variations
Here, explaining a variety of set of test Promptts designed to cover a complex quality range. Including simple relevant models (may be directed to a weaker model), the center of the borderline activities, and higher crimes or creative applications), and Codet production activities to test technical skills. Look Full codes here.
threshold = 0.24034
prompts = [
# Easy factual (likely weak model)
"Who wrote the novel 'Pride and Prejudice'?",
"What is the largest planet in our solar system?",
# Medium reasoning (borderline cases)
"If a train leaves at 3 PM and travels 60 km/h, how far will it travel by 6:30 PM?",
"Explain why the sky appears blue during the day and red/orange during sunset.",
# High complexity / creative (likely strong model)
"Write a 6-line rap verse about climate change using internal rhyme.",
"Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.",
# Code generation
"Write a Python function to check if a given string is a palindrome, ignoring punctuation and spaces.",
"Generate SQL to find the top 3 highest-paying customers from a 'sales' table."
]
Checking the Winning Level
The following code lists a winning amount with each test of the MF Router, indicating opportunities for a strong model will issue a weak model.
Based on a measured threshold of 0.24034, the two rising –
“If the train leaves 3 PM and moves at 60 km / h, how far will it go 6:30 pm? “(0.303087)
“Write the Python work to check that the given string is palindrome, ignoring the punctuation and spaces. “(0.272534)
– Scroll and you can be postponed in a strong model.
All other issues live under the limit, meaning that they will be given a Angara, cheap. Look Full codes here.
win_rates = client.batch_calculate_win_rate(prompts=pd.Series(prompts), router="mf")
# Store results in DataFrame
_df = pd.DataFrame({
"Prompt": prompts,
"Win_Rate": win_rates
})
# Show full text without truncation
pd.set_option('display.max_colwidth', None)
These results also assist in ordering a route strategy – by analyzing the transmission of winning, we can address the limit to better monitoring cost and performance.
ROUTING prMpts through the burning model of model (MF) Router
This code keeps on the update list and sends each of the Rouullolly controls using the Authorized MF Router (Router-MF- MF- {Limit}).
Each time, the router determines whether it can use a strong or weak model based on the number of winning.
The answer includes the productivity of the Router genuine.
This information – the model used, used, and produced – stored in the list of results to analyze later. Look Full codes here.
results = []
for prompt in prompts:
response = client.chat.completions.create(
model=f"router-mf-{threshold}",
messages=[{"role": "user", "content": prompt}]
)
message = response.choices[0].message["content"]
model_used = response.model # RouteLLM returns the model actually used
results.append({
"Prompt": prompt,
"Model Used": model_used,
"Output": message
})
df = pd.DataFrame(results)
In the results, it stimulates 2 and 6 through the Threshold Win rate and therefore arranged in a strong GPT-5 model, while others were treated by a weak model.
Look Full codes here. Feel free to look our GITHUB page for tutorials, codes and letters of writing. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.

I am the student of the community engineering (2022) from Jamia Millia Islamia, New Delhi, and I am very interested in data science, especially neural networks and their application at various locations.



