An introduction to building advanced machine learning APIs for multi-endpoint apis with litseve: Closure, streaming, preservation, and local humility

In this lesson, we examine them Litseva lightweight and powerful framework that allows us to export machine learning models as APIs with minimal effort. We build and test multiple Endpoints that feature real-world functionality such as text generation, binding, streaming, active processing, and caching, all running locally without relying on external APIs. In the end, we have a good understanding of how to design streamlined and flexible mL pipelines that are efficient and easy to scale for production-level applications. Look Full codes here.
!pip install litserve torch transformers -q
import litserve as ls
import torch
from transformers import pipeline
import time
from typing import List
We start by setting up our environment in Google Colab and installing all the necessary dependencies, including litsorve, pytorch, and converters. We then import key libraries and modules that will allow us to properly define, serve, and test our APIs. Look Full codes here.
class TextGeneratorAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline("text-generation", model="distilgpt2", device=0 if device == "cuda" and torch.cuda.is_available() else -1)
self.device = device
def decode_request(self, request):
return request["prompt"]
def predict(self, prompt):
result = self.model(prompt, max_length=100, num_return_sequences=1, temperature=0.8, do_sample=True)
return result[0]['generated_text']
def encode_response(self, output):
return {"generated_text": output, "model": "distilgpt2"}
class BatchedSentimentAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", device=0 if device == "cuda" and torch.cuda.is_available() else -1)
def decode_request(self, request):
return request["text"]
def batch(self, inputs: List[str]) -> List[str]:
return inputs
def predict(self, batch: List[str]):
results = self.model(batch)
return results
def unbatch(self, output):
return output
def encode_response(self, output):
return {"label": output["label"], "score": float(output["score"]), "batched": True}
Here, we developed two litseneve apis, one for text generation using the local distilgpt2 model and the other for integrated sentiment analysis. We explain how each API places incoming requests, implements synchronization, and returns structured responses, showing how easy it is to build structured, reactive applications. Look Full codes here.
class StreamingTextAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline("text-generation", model="distilgpt2", device=0 if device == "cuda" and torch.cuda.is_available() else -1)
def decode_request(self, request):
return request["prompt"]
def predict(self, prompt):
words = ["Once", "upon", "a", "time", "in", "a", "digital", "world"]
for word in words:
time.sleep(0.1)
yield word + " "
def encode_response(self, output):
for token in output:
yield {"token": token}
In this section, we design a script-generation streaming api that issues tokens as they are generated. We simulate real-time streaming by allowing words simultaneously, showing how litseve can handle continuous token generation. Look Full codes here.
class MultiTaskAPI(ls.LitAPI):
def setup(self, device):
self.sentiment = pipeline("sentiment-analysis", device=-1)
self.summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-6-6", device=-1)
self.device = device
def decode_request(self, request):
return {"task": request.get("task", "sentiment"), "text": request["text"]}
def predict(self, inputs):
task = inputs["task"]
text = inputs["text"]
if task == "sentiment":
result = self.sentiment(text)[0]
return {"task": "sentiment", "result": result}
elif task == "summarize":
if len(text.split()) < 30:
return {"task": "summarize", "result": {"summary_text": text}}
result = self.summarizer(text, max_length=50, min_length=10)[0]
return {"task": "summarize", "result": result}
else:
return {"task": "unknown", "error": "Unsupported task"}
def encode_response(self, output):
return output
We are now developing multiple APIs using sentiment analysis and summarization through a single Endpoint. This example shows how we can manage multiple pipelines of models with a unified display, by changing the power of each application to the appropriate pipeline based on the assigned task. Look Full codes here.
class CachedAPI(ls.LitAPI):
def setup(self, device):
self.model = pipeline("sentiment-analysis", device=-1)
self.cache = {}
self.hits = 0
self.misses = 0
def decode_request(self, request):
return request["text"]
def predict(self, text):
if text in self.cache:
self.hits += 1
return self.cache[text], True
self.misses += 1
result = self.model(text)[0]
self.cache[text] = result
return result, False
def encode_response(self, output):
result, from_cache = output
return {"label": result["label"], "score": float(result["score"]), "from_cache": from_cache, "cache_stats": {"hits": self.hits, "misses": self.misses}}
We use an API that uses caching to store the results of previous infence, reducing the unnecessary integration of repeated requests. We track cache hits and misses in real-time, showing how temporary caching methods can improve performance for repeated fetch scenarios. Look Full codes here.
def test_apis_locally():
print("=" * 70)
print("Testing APIs Locally (No Server)")
print("=" * 70)
api1 = TextGeneratorAPI(); api1.setup("cpu")
decoded = api1.decode_request({"prompt": "Artificial intelligence will"})
result = api1.predict(decoded)
encoded = api1.encode_response(result)
print(f"✓ Result: {encoded['generated_text'][:100]}...")
api2 = BatchedSentimentAPI(); api2.setup("cpu")
texts = ["I love Python!", "This is terrible.", "Neutral statement."]
decoded_batch = [api2.decode_request({"text": t}) for t in texts]
batched = api2.batch(decoded_batch)
results = api2.predict(batched)
unbatched = api2.unbatch(results)
for i, r in enumerate(unbatched):
encoded = api2.encode_response(r)
print(f"✓ '{texts[i]}' -> {encoded['label']} ({encoded['score']:.2f})")
api3 = MultiTaskAPI(); api3.setup("cpu")
decoded = api3.decode_request({"task": "sentiment", "text": "Amazing tutorial!"})
result = api3.predict(decoded)
print(f"✓ Sentiment: {result['result']}")
api4 = CachedAPI(); api4.setup("cpu")
test_text = "LitServe is awesome!"
for i in range(3):
decoded = api4.decode_request({"text": test_text})
result = api4.predict(decoded)
encoded = api4.encode_response(result)
print(f"✓ Request {i+1}: {encoded['label']} (cached: {encoded['from_cache']})")
print("=" * 70)
print("✅ All tests completed successfully!")
print("=" * 70)
test_apis_locally()
We test all our APIs locally to ensure their accuracy and functionality without starting an external server. We test sequentially, integrated sentiment analysis, multimodality, and maintenance, to ensure that each part of our litseve setup is working properly and efficiently.
In conclusion, we build and use various apis that demonstrate the flexibility of the framework. We experimented with text generation, sentiment analysis, multitasking, and saving to find litseth integration. As we complete the course, we see how litserve creates an exemplary workflow, enabling us to work on intelligent ML programs in a few lines of Python code while working with flexibility, performance, and simplicity.
Look Full codes here. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.
AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of the intelligence media platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.
Follow Marktechpost: Add us as a favorite source on Google.



