ANI

No Budget, Full Stack: Building with Free LLMs Only

nimda March 31, 2026

0 14 7 minutes read

No Budget, Full Stack: Building with Free LLMs Only

Photo by the Author

# Introduction

Remember when building a full-stack application required expensive cloud credits, expensive API keys, and a team of developers? Those days are officially over. By 2026, developers can build, deploy, and measure a production-ready application using nothing but free tools, including major language models (LLMs) that empowers its faculty.

The landscape has changed dramatically. Open source models are now challenging their commercial counterparts. AI coding assistants have grown from simple auto-completion tools to full-fledged coding agents that can design every feature. And perhaps most importantly, you can use high-quality models locally or through free categories without spending a dime.

In this comprehensive article, we will build a real-world application – a summary of the AI conference notes. Users will upload voice recordings, and our app will transcribe them, extract key points and action items, and display everything on a clean dashboard, all using completely free tools.

Whether you're a student, a bootcamp student, or an experienced developer looking to prototype an idea, this tutorial will show you how to use the free AI tools available. Start by understanding why free LLMs are working so well today.

# Understanding Why Free Larger Language Models Are Working Now

Just two years ago, building an AI-powered application meant budgeting for OpenAI API credits or hiring expensive GPU instances. The economy has changed.

The gap between commercial and open source LLMs has almost disappeared. The models are similar GLM-4.7-Flash from Zipu AI shows that open source can achieve modern functionality while being completely free to use. Likewise, LFM2-2.6B-Transcript It is specially designed for assembly compression and works perfectly on a device with cloud-level quality.

What this means for you is that you are no longer locked into a single vendor. If one model doesn't work for your use case, you can switch to another without changing your infrastructure.

// Joining the Self-Hosted Movement

There is a growing preference for models that use local AI on your hardware instead of sending data to the cloud. This is not just about cost; it's about privacy, latency, and control. With tools like Ollama again LM Studioyou can use powerful models on a laptop.

// Adopting a “Bring Your Own Key” Model.

A new class of tools has emerged: open source applications that are free but require you to provide your API keys. This gives you ultimate flexibility. You can use Google Gemini API (which offers hundreds of free applications every day) or uses completely local models without ongoing costs.

# Choosing your free artificial intelligence stack

Sorting out the best free options for each part of our app involves choosing tools that balance functionality and ease of use.

// Transcription Layers: Speech-to-Text

To convert audio to text, we have excellent free speech-to-text (STT) tools.

A tool	Kind of	Free category	It's very good
OpenAI Whisper	Open source model	Unlimited (automation)	To be precise, many languages
Whisper.cpp	A privacy-focused implementation	Unlimited (open source)	Conditions affecting privacy
Gemini API	Cloud API	60 requests/minute	Rapid prototyping

For our project, we will use Whisperwhich you can use locally or with free hosting options. It supports more than 100 languages and produces high quality documents.

// Summarizing and Analyzing: The Big Language Model

This is where you can choose the most. All options below are completely free:

Model	Provider	Kind of	Expertise
GLM-4.7-Flash	ZIP AI	Cloud (free API)	General purpose, coding
LFM2-2.6B-Transcript	Liquid AI	Location/device	Summary of the meeting
Gemini 1.5 Flash	Google	Cloud API	Long content, free category
GPT-OSS Swallow	Tokyo Tech	Local/self-hosted	Japanese/English description

For our summary of the meeting, i LFM2-2.6B-Transcript the model is particularly interesting; it was literally trained for this use case and runs on less than 3GB of RAM.

// Accelerating Development: Intelligent Code Assistants

Before we write a single line of code, consider the tools that help us build efficiently within an integrated development environment (IDE):

A tool	Free category	Kind of	Key Feature
Comet	Full for free	VS code extension	SPEC-driven, multi-agent
The codeium	Unlimited free	An IDE extension	70+ languages, fast guess
Cline	Free (BYOK)	VS code extension	Automatic file sorting
Go ahead	Full open source	An IDE extension	Works with any LLM
bolt.diy	You're holding on to yourself	A browser IDE	Full stack generation

Our recommendation: For this project, we will use Codeium with its unlimited free tier and speed, and we will keep Qhubeka as a backup when we need to switch between different LLM providers.

// Reviewing the Traditional Free Stack

Front: We responded (free and open source)
Background: FastAPI (Python, free)
Website: SQLite (file based, no server required)
Shipping: Vercel (open free category) + Give (back)

# Reviewing the Project Plan

To specify the application:

User uploads an audio file (meeting recording, voice memo, lecture)
The backend receives the file and passes it to Whisper for writing
The transcript is sent to the LLM for condensing
The LLM issues key points for discussion, action items, and decisions
The results are stored in SQLite
The user sees a clean dashboard with text, summary, and action items

Diagram of a flow chart with seven sequential steps | Photo by the Author

// What is required

Python 3.9+ is included
Node.js and npm are included
Basic familiarity with Python and React
Code editor (VS Code is recommended)

// Step 1: Setting Up Backend with FastAPI

First, create our project directory and set the virtual environment:

mkdir meeting-summarizer
cd meeting-summarizer
python -m venv venv

Activate virtual environment:

# On Windows 
venvScriptsactivate

# On Linux/macOS
source venv/bin/activate

Install the required packages:

pip install fastapi uvicorn python-multipart openai-whisper transformers torch openai

Now, create a file of main.py file for our FastAPI application and add this code:

from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.middleware.cors import CORSMiddleware
import whisper
import sqlite3
import json
import os
from datetime import datetime

app = FastAPI()

# Enable CORS for React frontend
app.add_middleware(
    CORSMiddleware,
    allow_origins=["
    allow_methods=["*"],
    allow_headers=["*"],
)

# Initialize Whisper model - using "tiny" for faster CPU processing
print("Loading Whisper model (tiny)...")
model = whisper.load_model("tiny")
print("Whisper model loaded!")

# Database setup
def init_db():
    conn = sqlite3.connect('meetings.db')
    c = conn.cursor()
    c.execute('''CREATE TABLE IF NOT EXISTS meetings
                 (id INTEGER PRIMARY KEY AUTOINCREMENT,
                  filename TEXT,
                  transcript TEXT,
                  summary TEXT,
                  action_items TEXT,
                  created_at TIMESTAMP)''')
    conn.commit()
    conn.close()

init_db()

async def summarize_with_llm(transcript: str) -> dict:
    """Placeholder for LLM summarization logic"""
    # This will be implemented in Step 2
    return {"summary": "Summary pending...", "action_items": []}

@app.post("/upload")
async def upload_audio(file: UploadFile = File(...)):
    file_path = f"temp_{file.filename}"
    with open(file_path, "wb") as buffer:
        content = await file.read()
        buffer.write(content)
    
    try:
        # Step 1: Transcribe with Whisper
        result = model.transcribe(file_path, fp16=False)
        transcript = result["text"]
        
        # Step 2: Summarize (To be filled in Step 2)
        summary_result = await summarize_with_llm(transcript)
        
        # Step 3: Save to database
        conn = sqlite3.connect('meetings.db')
        c = conn.cursor()
        c.execute(
            "INSERT INTO meetings (filename, transcript, summary, action_items, created_at) VALUES (?, ?, ?, ?, ?)",
            (file.filename, transcript, summary_result["summary"],
             json.dumps(summary_result["action_items"]), datetime.now())
        )
        conn.commit()
        meeting_id = c.lastrowid
        conn.close()
        
        os.remove(file_path)
        return {
            "id": meeting_id,
            "transcript": transcript,
            "summary": summary_result["summary"],
            "action_items": summary_result["action_items"]
        }
    except Exception as e:
        if os.path.exists(file_path):
            os.remove(file_path)
        raise HTTPException(status_code=500, detail=str(e))

// Step 2: Assembling the Free Large Language Model

Now, let's use the summarize_with_llm() work. We will show two methods:

Option A: Using GLM-4.7-Flash API (Cloud, Free)

from openai import OpenAI

async def summarize_with_llm(transcript: str) -> dict:
    client = OpenAI(api_key="YOUR_FREE_ZHIPU_KEY", base_url="
    
    response = client.chat.completions.create(
        model="glm-4-flash",
        messages=[
            {"role": "system", "content": "Summarize the following meeting transcript and extract action items in JSON format."},
            {"role": "user", "content": transcript}
        ],
        response_format={"type": "json_object"}
    )
    
    return json.loads(response.choices[0].message.content)

Option B: Using Local LFM2-2.6B-Transcript (Local, Completely Free)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

async def summarize_with_llm_local(transcript):
    model_name = "LiquidAI/LFM2-2.6B-Transcript"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    
    prompt = f"Analyze this transcript and provide a summary and action items:nn{transcript}"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=500)
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

// Step 3: Creating the React Frontend

Build a simple React frontend to interact with our API. In a new terminal, create a React application:

npx create-react-app frontend
cd frontend
npm install axios

Replace the contents of the src/App.js and:

import React, { useState } from 'react';
import axios from 'axios';
import './App.css';

function App() {
  const [file, setFile] = useState(null);
  const [uploading, setUploading] = useState(false);
  const [result, setResult] = useState(null);
  const [error, setError] = useState('');

  const handleUpload = async () => {
    if (!file) { setError('Please select a file'); return; }
    setUploading(true);
    const formData = new FormData();
    formData.append('file', file);

    try {
      const response = await axios.post(' formData);
      setResult(response.data);
    } catch (err) {
      setError('Upload failed: ' + (err.response?.data?.detail || err.message));
    } finally { setUploading(false); }
  };

  return (
    
      
      
        
        {result && (
          
            Summary
{result.summary}
            Action Items
            {result.action_items.map((it, i) => {it}
)}
          
        )}
      
    
  );
}

export default App;

// Step 4: Starting the Application

Start the backend: In the main directory of your virtual environment, run uvicorn main:app --reload
Start the frontend: In a new terminal, in the frontend directory, run npm start
Open it in your browser and load the test audio file

A dashboard interface that displays summary results | Photo by the Author

# Submitting an Application is Free

Once your app is up and running locally, it's time to take it to the world — while it's still free. Give offers a free class of web services. Push your code to a GitHub repository, create a new Web Service in Render, and apply these settings:

Environment: Python 3
Build command: pip install -r requirements.txt
Start Command: uvicorn main:app --host 0.0.0.0 --port $PORT

Create a requirements.txt file:

fastapi
uvicorn
python-multipart
openai-whisper
transformers
torch
openai

Note: Whisper and Transformers require significant disk space. If you're reaching the limits of the free tier, consider using the cloud API for subscriptions instead.

// Frontind dawns in Vercel

Vercel the easiest way to deploy React apps:

Install the Vercel CLI: npm i -g vercel
In your foreground directory, use vercel
Update your API URL App.js to point back to your Render

// It explores alternative uses in the area

If you want to avoid cloud hosting altogether, you can run both frontend and backend on a local server using similar tools Mr temporarily exposing your local server.

# The conclusion

We've just built a production-ready AI app using nothing but free tools. Let's recap what we accomplished:

Scripting: Used Whisper for OpenAI (free, open source)
Summary: Leveraged GLM-4.7-Flash or LFM2-2.6B (both completely free)
Backend: Built with FastAPI (free)
Frontend: Created with React (free)
Database: SQLite used (free)
Deployment: Used in Vercel and Render (free tiers)
Development: Accelerated by free AI coding assistants like Codeium

The landscape of free AI development has never been more promising. Open source models now compete with commercial offerings. Local AI tools give us privacy and control. And free tiers from providers like Google and Zipu AI allow us to prototype without financial risk.

Long Shithu is a software engineer and technical writer who likes to use cutting-edge technology to make interesting stories, with a keen eye for detail and the ability to simplify complex concepts. You can also find Shittu Twitter.

Source link

nimda March 31, 2026

0 14 7 minutes read