An introduction to building advanced machine learning APIs for multi-endpoint apis with litseve: Closure, streaming, preservation, and local humility

0 1 4 minutes read

An introduction to building advanced machine learning APIs for multi-endpoint apis with litseve: Closure, streaming, preservation, and local humility

In this lesson, we examine them Litseva lightweight and powerful framework that allows us to export machine learning models as APIs with minimal effort. We build and test multiple Endpoints that feature real-world functionality such as text generation, binding, streaming, active processing, and caching, all running locally without relying on external APIs. In the end, we have a good understanding of how to design streamlined and flexible mL pipelines that are efficient and easy to scale for production-level applications. Look Full codes here.

!pip install litserve torch transformers -q


import litserve as ls
import torch
from transformers import pipeline
import time
from typing import List

We start by setting up our environment in Google Colab and installing all the necessary dependencies, including litsorve, pytorch, and converters. We then import key libraries and modules that will allow us to properly define, serve, and test our APIs. Look Full codes here.

class TextGeneratorAPI(ls.LitAPI):
   def setup(self, device):
       self.model = pipeline("text-generation", model="distilgpt2", device=0 if device == "cuda" and torch.cuda.is_available() else -1)
       self.device = device
   def decode_request(self, request):
       return request["prompt"]
   def predict(self, prompt):
       result = self.model(prompt, max_length=100, num_return_sequences=1, temperature=0.8, do_sample=True)
       return result[0]['generated_text']
   def encode_response(self, output):
       return {"generated_text": output, "model": "distilgpt2"}


class BatchedSentimentAPI(ls.LitAPI):
   def setup(self, device):
       self.model = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", device=0 if device == "cuda" and torch.cuda.is_available() else -1)
   def decode_request(self, request):
       return request["text"]
   def batch(self, inputs: List[str]) -> List[str]:
       return inputs
   def predict(self, batch: List[str]):
       results = self.model(batch)
       return results
   def unbatch(self, output):
       return output
   def encode_response(self, output):
       return {"label": output["label"], "score": float(output["score"]), "batched": True}

Here, we developed two litseneve apis, one for text generation using the local distilgpt2 model and the other for integrated sentiment analysis. We explain how each API places incoming requests, implements synchronization, and returns structured responses, showing how easy it is to build structured, reactive applications. Look Full codes here.

class StreamingTextAPI(ls.LitAPI):
   def setup(self, device):
       self.model = pipeline("text-generation", model="distilgpt2", device=0 if device == "cuda" and torch.cuda.is_available() else -1)
   def decode_request(self, request):
       return request["prompt"]
   def predict(self, prompt):
       words = ["Once", "upon", "a", "time", "in", "a", "digital", "world"]
       for word in words:
           time.sleep(0.1)
           yield word + " "
   def encode_response(self, output):
       for token in output:
           yield {"token": token}

In this section, we design a script-generation streaming api that issues tokens as they are generated. We simulate real-time streaming by allowing words simultaneously, showing how litseve can handle continuous token generation. Look Full codes here.

class MultiTaskAPI(ls.LitAPI):
   def setup(self, device):
       self.sentiment = pipeline("sentiment-analysis", device=-1)
       self.summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-6-6", device=-1)
       self.device = device
   def decode_request(self, request):
       return {"task": request.get("task", "sentiment"), "text": request["text"]}
   def predict(self, inputs):
       task = inputs["task"]
       text = inputs["text"]
       if task == "sentiment":
           result = self.sentiment(text)[0]
           return {"task": "sentiment", "result": result}
       elif task == "summarize":
           if len(text.split()) < 30:
               return {"task": "summarize", "result": {"summary_text": text}}
           result = self.summarizer(text, max_length=50, min_length=10)[0]
           return {"task": "summarize", "result": result}
       else:
           return {"task": "unknown", "error": "Unsupported task"}
   def encode_response(self, output):
       return output

We are now developing multiple APIs using sentiment analysis and summarization through a single Endpoint. This example shows how we can manage multiple pipelines of models with a unified display, by changing the power of each application to the appropriate pipeline based on the assigned task. Look Full codes here.

class CachedAPI(ls.LitAPI):
   def setup(self, device):
       self.model = pipeline("sentiment-analysis", device=-1)
       self.cache = {}
       self.hits = 0
       self.misses = 0
   def decode_request(self, request):
       return request["text"]
   def predict(self, text):
       if text in self.cache:
           self.hits += 1
           return self.cache[text], True
       self.misses += 1
       result = self.model(text)[0]
       self.cache[text] = result
       return result, False
   def encode_response(self, output):
       result, from_cache = output
       return {"label": result["label"], "score": float(result["score"]), "from_cache": from_cache, "cache_stats": {"hits": self.hits, "misses": self.misses}}

We use an API that uses caching to store the results of previous infence, reducing the unnecessary integration of repeated requests. We track cache hits and misses in real-time, showing how temporary caching methods can improve performance for repeated fetch scenarios. Look Full codes here.

def test_apis_locally():
   print("=" * 70)
   print("Testing APIs Locally (No Server)")
   print("=" * 70)


   api1 = TextGeneratorAPI(); api1.setup("cpu")
   decoded = api1.decode_request({"prompt": "Artificial intelligence will"})
   result = api1.predict(decoded)
   encoded = api1.encode_response(result)
   print(f"✓ Result: {encoded['generated_text'][:100]}...")


   api2 = BatchedSentimentAPI(); api2.setup("cpu")
   texts = ["I love Python!", "This is terrible.", "Neutral statement."]
   decoded_batch = [api2.decode_request({"text": t}) for t in texts]
   batched = api2.batch(decoded_batch)
   results = api2.predict(batched)
   unbatched = api2.unbatch(results)
   for i, r in enumerate(unbatched):
       encoded = api2.encode_response(r)
       print(f"✓ '{texts[i]}' -> {encoded['label']} ({encoded['score']:.2f})")


   api3 = MultiTaskAPI(); api3.setup("cpu")
   decoded = api3.decode_request({"task": "sentiment", "text": "Amazing tutorial!"})
   result = api3.predict(decoded)
   print(f"✓ Sentiment: {result['result']}")


   api4 = CachedAPI(); api4.setup("cpu")
   test_text = "LitServe is awesome!"
   for i in range(3):
       decoded = api4.decode_request({"text": test_text})
       result = api4.predict(decoded)
       encoded = api4.encode_response(result)
       print(f"✓ Request {i+1}: {encoded['label']} (cached: {encoded['from_cache']})")


   print("=" * 70)
   print("✅ All tests completed successfully!")
   print("=" * 70)


test_apis_locally()

We test all our APIs locally to ensure their accuracy and functionality without starting an external server. We test sequentially, integrated sentiment analysis, multimodality, and maintenance, to ensure that each part of our litseve setup is working properly and efficiently.

In conclusion, we build and use various apis that demonstrate the flexibility of the framework. We experimented with text generation, sentiment analysis, multitasking, and saving to find litseth integration. As we complete the course, we see how litserve creates an exemplary workflow, enabling us to work on intelligent ML programs in a few lines of Python code while working with flexibility, performance, and simplicity.

Look Full codes here. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.

AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of the intelligence media platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.

Follow Marktechpost: Add us as a favorite source on Google.

Source link

nimda 18 hours ago

0 1 4 minutes read

An introduction to building advanced machine learning APIs for multi-endpoint apis with litseve: Closure, streaming, preservation, and local humility

nimda

Leave a Reply Cancel reply

Google AI issuing MLE-Star: State Engineering Agent to work with Autory A Tasks

Servicess MCP brings correcting AWS running AWS travel within modern IDs

Subscribers, Revenue, Market Share & Global Reach

How to build a fully functional computing agent that thinks, plans, and executes virtual actions using spatial ai models

The Ultimate Guide to ChatGPT: What You Need to Know

Be Part of the AI Revolution at the Chatbot Conference Tomorrow! | by Cassandra C.

Botober 2024

Virtual Personas for Language Models with An Anthology of Backstories – Berkeley Artificial Intelligence Research Blog

Machine Learning Interview Questions and Answers

nimda

Subscribe to our mailing list to get the new updates!

Responsible AI design in healthcare and life sciences

Liquid Ai's LFM2-VL-3B brings 3B Parameter Language Modeling (VLM) to class devices

Related Articles

How to build a fully functional computing agent that thinks, plans, and executes virtual actions using spatial ai models

Google vs OpenAI vs Anthropic: The Agentic AI Arms Race Breakdown

Liquid Ai's LFM2-VL-3B brings 3B Parameter Language Modeling (VLM) to class devices

Salesforce Ai Research introduces walts (Web agents that learn tools): enabling LLM agents to automatically discover smooth tools on any website

Leave a Reply Cancel reply

Google AI issuing MLE-Star: State Engineering Agent to work with Autory A Tasks

Servicess MCP brings correcting AWS running AWS travel within modern IDs

Subscribers, Revenue, Market Share & Global Reach

How to build a fully functional computing agent that thinks, plans, and executes virtual actions using spatial ai models

The Ultimate Guide to ChatGPT: What You Need to Know

Be Part of the AI ​​Revolution at the Chatbot Conference Tomorrow! | by Cassandra C.

Botober 2024

Virtual Personas for Language Models with An Anthology of Backstories – Berkeley Artificial Intelligence Research Blog

Machine Learning Interview Questions and Answers

Be Part of the AI Revolution at the Chatbot Conference Tomorrow! | by Cassandra C.