NLP that works in the browser with Transformers.js

# Introduction
For a long time, using transformer models meant maintaining a Python server, paying for GPU time, and routing all touch requests through an API. The user typed something, it left their machine, it touched your infrastructure, and it came back as a prediction. Those structures made sense when the models were too large to work elsewhere. It is no longer the only option.
Transformers.js changes the equation. It uses state-of-the-art NLP models in the browser, on the user's device, with no server involved. Models download once, cache locally, and work offline from that point forward. The Python-to-JavaScript translation is almost one-to-one:
// JavaScript -- nearly identical
import { pipeline } from '@huggingface/transformers';
const classifier = await pipeline('sentiment-analysis');
const result = await classifier('I love transformers!');
This course covers three NLP tasks: text segmentation, semantic labeling, and query answering using Transformers.js's pipeline() API. For each function, you'll see how to run the pipeline, what the output structure looks like and how to interpret it, and a working HTML example that you can open directly in a browser. The tutorial closes with a complete routing ticket application that combines all three pipelines into one functional tool.
All code examples in this article use the CDN import method, so no build step is required. Open a text editor, paste the code, and run it.
# What Transformers.js Actually Is
The library is designed to work on par with Hugging Face's Python transformers library, which means the same pre-trained models, the same function names, and the same pipeline API in JavaScript. Under the hood, the bridge that makes this possible is the ONNX Runtime.
Models trained in PyTorch, TensorFlow, or JAX are converted to ONNX format using Hugging Face Optimum. ONNX Runtime then uses these models in the browser. By default, it runs on the CPU with WebAssembly (WASM), which works in all modern browsers. If you want GPU acceleration, setup device: 'webgpu' routes through the browser's WebGPU API as fast as possible where available, although it is still being tested in other areas.
- Model caching. For the first time the pipeline works, the model weights download from the Hugging Face Hub and the cache in the browser's IndexedDB in the browser's context, the file system in Node.js. Developer testing shows downloads for the sentiment analysis pipeline around 111 MB on first load. The next run skips downloading completely and loading from cache. This means that the user's first session has a bandwidth cost; everything after that is instant and offline
- Quantization. I
dtypeoption controls the accuracy of the model.q8(8-bit quantization) is the default for WASM; it gives you a good balance of size and accuracy.q4it cuts the file roughly in half with a 1–3% accuracy loss for most tasks, which is a fair trade-off for a mobile or slow connection. Using the Node.js server,fp32provides full accuracy without size limitation
// Default WASM execution -- works everywhere
const pipe = await pipeline('sentiment-analysis');
// WebGPU for faster inference on compatible hardware
const pipe = await pipeline('sentiment-analysis', null, { device: 'webgpu' });
// 4-bit quantization for smaller model downloads
const pipe = await pipeline('sentiment-analysis',
'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
{ dtype: 'q4' }
);
# pipeline() API
I pipe work is all social interaction in many use cases. It combines three things: a pre-trained model, a token, and post-processing, into one hit. You don't touch the tokenizer or model weights directly. You run a pipe with text and get structured output.
A signature has three parts:
const pipe = await pipeline(task, model?, options?);
const result = await pipe(input, inferenceOptions?);
task a string pointer that tells the library what type of model to load and how to handle input and output. model you choose; if you quit, the library loads the default model for that function. If you specify a model ID (such as 'Xenova/distilbert-base-uncased-finetuned-sst-2-english'), that model loads from the Hub. options that's where you put it device, dtypeagain progress_callback.
Both measures are not synchronized. pipeline() download and load the model into memory. This is the slowest part of the first run. The pipeline call itself is usually faster once the model is loaded. Both return Promises, which means your UI needs to handle the load state.
A progress_callbackallows you to track downloads and display user progress:
// progress_callback fires during model download with status updates
// This is important UX -- users need to know something is happening
const pipe = await pipeline(
'sentiment-analysis',
'Xenova/distilbert-base-uncased-finetuned-sst-2-english',
{
dtype: 'q8',
progress_callback: (progress) => {
// progress.status can be: 'initiate', 'download', 'progress', 'done'
if (progress.status === 'progress') {
const pct = Math.round(progress.progress);
document.getElementById('progress').textContent =
`Loading model: ${pct}%`;
}
if (progress.status === 'ready') {
document.getElementById('progress').textContent="Model ready";
}
}
}
);
One important note from the official documentation: Transformers.js is a library only. You can't fine tune or train models with it. If your work requires a custom model, training takes place elsewhere (Python, cloud), and the resulting ONNX deployment runs in the browser.
# Task 1: Text Editing
Text classification assigns a label and a confidence score to the input text. The most common form is sentiment analysis, positive versus negative, but the same pipeline structure handles any fixed set of categories the model was trained on.
Here's what the output looks like:
const result = await classifier('This product completely exceeded my expectations.');
// [{ label: 'POSITIVE', score: 0.9997 }]
The output is an array of objects. Each item has label (class predicted as string) and score (float between 0 and 1 indicating model confidence). A score of 0.9997 means that the model is highly reliable. A score of 0.52 means that it is just above average for a decision that considers uncertainty and handles it appropriately in your application.
The output is always a program, even with a single input, because the same pipe call handles collections:
const results = await classifier([
'This is great!',
'Completely broken, waste of money.'
]);
// [
// { label: 'POSITIVE', score: 0.9998 },
// { label: 'NEGATIVE', score: 0.9991 }
// ]
// Example of Full Function
The example below is a complete, self-contained HTML file. Open it in any modern browser. The model downloads on first startup and saves subsequent, faster loads.
Text Classification with Transformers.js
Runs entirely in your browser -- no server, no API calls.
Downloading model on first run (this may take a moment)...
I loadModel work calls pipeline() by function name, model ID, and options. I progress_callback fires repeatedly during download and updates the status text so the user doesn't see a frozen screen. Once the model has loaded, the button is activated. When the user clicks Edit, classifier(text) executes inference synchronously from cache, typically less than 200ms on a modern laptop. The result is devastating label again score from the first array element, formats the confidence as a percentage, and uses the CSS class for color coding.
# Task 2: Zero-Shot Programming
Zero-shot segmentation does something that conventional text segmentation cannot: it segments text into segments that you define at runtime, with no training data required. Go through text and list labels in plain English. The model decides which label is the best fit based on its understanding of the semantics of the language.
This is useful anytime you can't or don't want to train a model with labeled examples, which is often the case in real projects.
// How It Works Under the Hood
The model reformulates each candidate label as a natural language hypothesis (NLI). for the label “payment issue“, make a hypothesis”This article is about the payment problem” and calculates the probability that the hypothesis is included in the input text. The label with the highest input score wins. This NLI-based method is why you can use any descriptive English sentence as a label and get a meaningful result. The model understands the meaning of your labels, not just their surface state.
Here's what the output looks like:
const classifier = await pipeline('zero-shot-classification',
'Xenova/bart-large-mnli');
const result = await classifier(
'My invoice is wrong and I was charged twice.',
['billing', 'technical support', 'shipping', 'returns', 'account access']
);
// {
// sequence: 'My invoice is wrong and I was charged twice.',
// labels: ['billing', 'returns', 'account access', 'technical support', 'shipping'],
// scores: [0.871, 0.063, 0.031, 0.022, 0.013]
// }
The output is an object with three fields. sequencethe original input text. labelsis an array of your candidate labels, sorted from highest to lowest score. scoresis an array of confidence scores in the same order. The first element of both arrays is always the winning predictor. Scores for all labels count to 1 if multi_labelis false (default).
Setting up multi_label: true it changes behavior: each label scores independently rather than competing, so that multiple labels all have high scores at the same time. Use this when the text obviously belongs to several sections at once.
// Example of Full Function
Here is your updated text block with all HTML brackets completely escaped. You can paste this directly into your custom HTML block in WordPress, and it will render just fine as a code snippet.
Zero-Shot Classifier -- Support Ticket Router
Paste a support ticket. The model routes it to the right department
with no training data needed.
Downloading model on first run...



