Machine Learning

Time Series Regression: How Hindsight Improves Forecasts

Helps in Time Series Forecasting

We all know how it goes: Time series data is tricky.

Traditional forecasting models are ill-equipped for events such as sudden market crashes, black swan events, or unusual weather patterns.

Even great big models like Chronos sometimes struggle because they've never faced that kind of pattern before.

We can reduce this to retrieve. By finding, we are able to ask Has there been anything like this before? then use that past example to guide the prediction.

As we all may know by now, in natural language processing (NLP), this idea is called Retrieval-Augmented Generation (RAG). It is becoming very popular in the world of time series forecasting.

The model then considers past situations which look similar to the current ones, and from there they can do more reliable predictions.

How is this RAF different from a traditional time series? The return forecast adds i clear step to access memory.

Instead of:

Previous -> parameters -> prediction

With to retrieve we have:

Current situation -> search for the same -> concrete previous episodes-> prediction

Cycle of Forecasting-Retrieval-Augmented Forecasting. Photo by Author | Napkin AI.

Instead of using what the model has learned over time trainingi the idea is to provide access to this grade similar circumstances.

It's like letting a climate model assess, “What was it like last winter like this one before?”.


Hello, I'm here Sara NobregaAn AI engineer. If you are dealing with similar issues or want feedback on how to apply these ideas, I am collecting my writing, resources, and advice links. here.


In this article, I explore to retrieveaugmented reality prediction from basic principles and show, with concrete examples and code examples, how retrieval can be used in real prediction pipelines.

What is Return-Augmented Forecast (RAF)?

What is the RAF? With a very high-level view, instead of relying solely on what the model has learned in training, RAF allows the model to look up. concrete past situations similar to the current one and use their results to a guide its prediction.

Let's see it in detail:

  • You turn the current situation (eg, the last few weeks of a stock data series) into a question.
  • This the question is used then search a database of historical time series segments to find out more similar patterns.
  • These games do not need to come out of the same stock; the system should also display similar movements from other stocks or financial products.

It takes those out patterns and what happened after that.

After that, this information imported of prediction model to help it make better predictions.

This process is powerful in:

  • Circumstances that don't matter: When the model is faced with something it was not trained on.
  • Extraordinary or unusual events: Like COVID, sudden financial crash, etc.
  • Seasonal trends are developing: Where past data contains useful patterns, but changes over time.

The RAF it does not replace you prediction model, but rather augments by giving it more ideas and supporting them with relevant historical examples.

Another example: suppose you want to predict energy consumption during an unusually hot week.

Instead of hoping that your model will remember how heat waves affect consumption, retrieve the findings similar past heat waves and allows the model to take into account what happened at that time.

What Do These Models Actually Return?

The “information” returned is not just raw data. The context that gives the model clues.

Here are some common examples:

Examples of Data Retrieval. Photo by Author | Napkin AI.
Examples of Data Retrieval. Photo by Author | Napkin AI.

As you can see, the returns focus on meaningful historical trends, such as unusual shocks, seasonal effects and patterns with similar properties. This provides a practical context for the current forecast.

How Do They Find These Models?

To find the right one patterns from the past, these examples use systematic methods that represent the present in a way that makes it possible it's easy search a large database and find the closest match.

The code snippets in this section are a simplified illustration intended to create an impression, they do not represent production code.

Retrieval methods for time series forecasting. Photo by Author | Napkin AI.
Retrieval methods for time series forecasting. Photo by Author | Napkin AI.

Some of these methods are:

Similarity Based Embedding

This converts the time series (or patches/windows of the series) into a compact one vectorsand compare them with distance metrics such as Euclidean or cosine similarity.

In simple names: The model transforms pieces of time series data into shorter ones summaries then he checks which one the past the snapshots look very similar to what is happening now.

Some predictors are not enhanced retrieval (eg The RAFT) find a very similar history leaflets from training data / whole series then to combine return values ​​with a weight equal to attention.

In simple words: It gets similar situations from the past and average they, paying more attention of best the same.

import numpy as np

# Example: embedding-based retrieval for time-series patches
# This is a toy example to show the *idea* behind retrieval.
# In practice:
# - embeddings are learned by neural networks
# - similarity search runs over millions of vectors
# - this logic lives inside a larger forecasting pipeline


def embed_patch(patch: np.ndarray) -> np.ndarray:
    """
    Convert a short time-series window ("patch") into a compact vector.

    Here we use simple statistics (mean, std, min, max) purely for illustration.
    Real-world systems might use:
      - a trained encoder network
      - shape-based representations
      - frequency-domain features
      - latent vectors from a forecasting backbone
    """
    return np.array([
        patch.mean(),   # average level
        patch.std(),    # volatility
        patch.min(),    # lowest point
        patch.max()     # highest point
    ])


def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """
    Measure how similar two vectors are.
    Cosine similarity focuses on *direction* rather than magnitude,
    which is often useful for comparing patterns or shapes.
    """
    return float(a @ b) / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-9)


# Step 1: Represent the current situation

# A short window representing the current time-series behavior
query_patch = np.array([10, 12, 18, 25, 14, 11])

# Turn it into an embedding
query_embedding = embed_patch(query_patch)


# Step 2: Represent historical situations

# Past windows extracted from historical data
historical_patches = [
    np.array([9, 11, 17, 24, 13, 10]),   # looks similar
    np.array([2, 2, 2, 2, 2, 2]),        # flat, unrelated
    np.array([10, 13, 19, 26, 15, 12])   # very similar
]

# Convert all historical patches into embeddings
historical_embeddings = [
    embed_patch(patch) for patch in historical_patches
]

# Step 3: Compare and retrieve the most similar past cases

# Compute similarity scores between the current situation
# and each historical example
similarities = [
    cosine_similarity(query_embedding, hist_emb)
    for hist_emb in historical_embeddings
]

# Rank historical patches by similarity
top_k_indices = np.argsort(similarities)[::-1][:2]

print("Most similar historical patches:", top_k_indices)

# Step 4 (conceptual):
# In a retrieval-augmented forecaster, the model would now:
# - retrieve the *future outcomes* of these similar patches
# - weight them by similarity (attention-like weighting)
# - use them to guide the final forecast
# This integration step is model-specific and not shown here.

Retrieval Tools and Libraries

1. FAISS
FAISS is very fast as well The GPU-a library for matching searches on dense vectors. I best datasets for this library are available big and inmemoryalthough its design makes real-time updates more difficult to implement.

import faiss
import numpy as np

# Suppose we already have embeddings for historical windows
d = 128  # embedding dimension
xb = np.random.randn(100_000, d).astype("float32")  # historical embeddings
xq = np.random.randn(1, d).astype("float32")        # query embedding

index = faiss.IndexFlatIP(d)   # inner product (often used with normalized vectors for cosine-like behavior)
index.add(xb)

k = 5
scores, ids = index.search(xq, k)
print("Nearest neighbors (ids):", ids)
print("Similarity scores:", scores)

# Some FAISS indexes/algorithms can run on GPU.

Looking at the nearest neighbor (Offensive)
Annoy's library is limited lightweight and easy to work with.

The best datasets for this library are historical datasets that remain highly static, as any modification to the dataset requires rebuilding the index.

from annoy import AnnoyIndex
import numpy as np

# Number of values in each embedding vector.
# The "length" of each fingerprint.
f = 64

# Create an Annoy index.
# This object will store many past embeddings and help us quickly find the most similar ones.
ann = AnnoyIndex(f, "angular")
# "angular" distance is commonly used to compare patterns
# and behaves similarly to cosine similarity.

# Add historical embeddings (past situations).
# Each item represents a compressed version of a past time-series window.
# Here we use random numbers just as an example.
for i in range(10000):
    ann.add_item(i, np.random.randn(f).tolist())

# Build the search structure.
# This step organizes the data so similarity searches are fast.
# After this, the index becomes read-only.
ann.build(10)

# Save the index to disk.
# This allows us to load it later without rebuilding everything.
ann.save("hist.ann")

# Create a query embedding.
# This represents the current situation we want to compare
# against past situations.
q = np.random.randn(f).tolist()

# Find the 5 most similar past embeddings.
# Annoy returns the IDs of the closest matches.
neighbors = ann.get_nns_by_vector(q, 5)

print("Nearest neighbors:", neighbors)

# Important note:
# Once the index is built, you cannot add new items.
# If new historical data appears, the index must be rebuilt.

Qdrant / Pinecone

Qdrant and Pinecone are similar Google embedding.

You store multiple “fingerprints” (and additional tags like city/season), and when you have a new fingerprint, you ask:

Show me very similar but only in this city/season/type of store.”
This is what makes it easier than rolling your own to retrieve: they handle quickly search again filtering!

Qdrant drives metadata burden of paymentand you can filter search results using conditions.

# Example only (for intuition). Real code needs a running Qdrant instance + real embeddings.

from qdrant_client import QdrantClient, models

client = QdrantClient(url="

collection = "time_series_windows"

# Pretend this is the embedding of the *current* time-series window
query_vector = [0.12, -0.03, 0.98, 0.44]  # shortened for readability

# Filter = "only consider past windows from New York in summer"
# Qdrant documentation shows filters built from FieldCondition + MatchValue. :contentReference[oaicite:3]{index=3}
query_filter = models.Filter(
    must=[
        models.FieldCondition(
            key="city",
            match=models.MatchValue(value="New York"),
        ),
        models.FieldCondition(
            key="season",
            match=models.MatchValue(value="summer"),
        ),
    ]
)

# In real usage, you’d call search/query and get back the nearest matches
# plus their payload (metadata) if you request it.
results = client.search(
    collection_name=collection,
    query_vector=query_vector,
    query_filter=query_filter,
    limit=5,
    with_payload=True,   # return metadata so you can inspect what you retrieved
)

print(results)

# What you'd do next (conceptually):
# - take the matched IDs
# - load the actual historical windows behind them
# - feed those windows (or their outcomes) into your forecasting model

Pine stores metadata key-value pairs near vectors and allows you to filter at query time (incl $eq) and return the metadata.

# Example only (for intuition). Real code needs an API key + an index host.

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index(host="INDEX_HOST")

# Pretend this is the embedding of the current time-series window
query_vector = [0.12, -0.03, 0.98, 0.44]  # shortened for readability

# Ask for the most similar past windows, but only where:
# city == "New York" AND season == "summer"
# Pinecone docs show query-time filtering and `$eq`. :contentReference[oaicite:5]{index=5}
res = index.query(
    namespace="windows",
    vector=query_vector,
    top_k=5,
    filter={
        "city": {"$eq": "New York"},
        "season": {"$eq": "summer"},
    },
    include_metadata=True,  # return tags so you can sanity-check matches
    include_values=False
)

print(res)

# Conceptually next:
# - use the returned IDs to fetch the underlying historical windows/outcomes
# - condition your forecast on those retrieved examples

Why vector DBs help? They let you do it matching search + “SQL-like filters like WHERE” in one step, which is hard to do cleanly with a DIY setup (both Qdrant payload filtering and Pinecone metadata filtering are first class features in their documentation.)

Each tool has its own trade. For example, FAISS good for performance but not ideal always updates. Qdrant provides flexibility and filtering in real time. Pinecone is easy to set up but SaaS-only.

Retrieval + Prediction: How to Combine

After you know what to find, the next step is to combine that information with the current input.

It can vary depending on the structure and function. There are several techniques for doing this (see image below).

Techniques for Combining Retrieval and Prediction
Techniques for Combining Retrieval and Prediction. Photo by Author | Napkin AI.

A. Integration
Idea:
treat retrieved content as “additional input” by combining it with existing sequences (most common in augmented-retrieval production setups).

It works well with transformer-based models like the Chronos and requires no structural changes.

import torch

# x_current: the model's usual input sequence (e.g., last N timesteps or tokens)
# shape: [batch, time, d_model]   (or [batch, time] if you think in tokens)
x_current = torch.randn(8, 128, 256)

# x_retrieved: retrieved context encoded in the SAME representation space
# e.g., embeddings for similar past windows (or their summaries)
# shape: [batch, retrieved_time, d_model]
x_retrieved = torch.randn(8, 32, 256)

# Simple fusion: just append retrieved context to the end of the input sequence
# Now the model sees: [current history ... + retrieved context ...]
x_fused = torch.cat([x_current, x_retrieved], dim=1)

# In practice, you'd also add:
# - an attention mask (so the model knows what’s real vs padded)
# - segment/type embeddings (so the model knows which part is retrieved context)
# Then feed x_fused to your transformer.

B. Cross-Attention Fusion
Idea:
keep “current input” and “derived context” separately, and allow the model listen to the returned context if needed. This is the core of “decoder integration with inverse attention” used by retrieval-friendly architectures like FiD.

import torch

# current_repr: representation of the current time-series window
# shape: [batch, time, d_model]
current_repr = torch.randn(8, 128, 256)

# retrieved_repr: representation of retrieved windows (could be concatenated)
# shape: [batch, retrieved_time, d_model]
retrieved_repr = torch.randn(8, 64, 256)

# Think of cross-attention like:
# - Query (Q) comes from the current sequence
# - Keys/Values (K/V) come from retrieved context
Q = current_repr
K = retrieved_repr
V = retrieved_repr

# Attention scores: "How much should each current timestep look at each retrieved timestep?"
scores = torch.matmul(Q, K.transpose(-1, -2)) / (Q.size(-1) ** 0.5)

# Turn scores into weights (so they sum to 1 across retrieved positions)
weights = torch.softmax(scores, dim=-1)

# Weighted sum of retrieved information (this is the “fused” retrieved signal)
retrieval_signal = torch.matmul(weights, V)

# Final fused representation: current info + retrieved info
# (Some models add, some concatenate, some use a learned projection)
fused = current_repr + retrieval_signal

# Then the forecasting head reads from `fused` to predict the future.

C. MoE
Idea: combine two “experts”:

  • i a predictor based on returns (non-parametric, case-based)
  • i basic predictor (parametric information)

The “gate” determines which one to trust the most at each step.

import torch

# base_pred: forecast from the main model (what it "learned in weights")
# shape: [batch, horizon]
base_pred = torch.randn(8, 24)

# retrieval_pred: forecast suggested by retrieved similar cases
# shape: [batch, horizon]
retrieval_pred = torch.randn(8, 24)

# context_for_gate: summary of the current situation (could be last hidden state)
# shape: [batch, d_model]
context_for_gate = torch.randn(8, 256)

# gate: a number between 0 and 1 saying "how much to trust retrieval"
# (In real models, this is a tiny neural net.)
gate = torch.sigmoid(torch.randn(8, 1))

# Mixture: convex combination
# - if gate ~ 1 -> trust retrieval more
# - if gate ~ 0 -> trust the base model more
final_pred = gate * retrieval_pred + (1 - gate) * base_pred

# In practice:
# - gate might be timestep-dependent: shape [batch, horizon, 1]
# - you might also add training losses to stabilize routing/usage (common in MoE)

D. Channel Information
Idea:
treat the returned string as additional input channels/features (especially natural for multiple time series, where each variable is a “channel”).

import torch

# x: multivariate time series input
# shape: [batch, time, channels]
# Example: channels could be [sales, price, promo_flag, temperature, ...]
x = torch.randn(8, 128, 5)

# retrieved_series_aligned: retrieved signal aligned to the same time grid
# Example: average of the top-k similar past windows (or one representative neighbor)
# shape: [batch, time, retrieved_channels]
retrieved_series_aligned = torch.randn(8, 128, 2)

# Channel prompting = append retrieved channels as extra features
# Now the model gets "normal channels + retrieved channels"
x_prompted = torch.cat([x, retrieved_series_aligned], dim=-1)

# In practice you’d likely also include:
# - a mask or confidence score for retrieved channels
# - normalization so retrieved signals are on a comparable scale
# Then feed x_prompted into the forecaster.

Some models even include many methods.

The most common way is to retrieve many the same series, combine them using attention so that the model can focus on the most relevant parts, and feed them to the expert.

Finish it

Return-Augmented Forecast (RAF) it allows your model to learn from the past in a way that traditional time series modeling cannot.

It works like external memory that helps the model navigate through unfamiliar situations with more confidence.

It is easy to test and brings significant improvements in forecasting activities.

Retrieval is no longer academic hype, it is already delivering results in real-world systems.

Thanks for reading!

My name is Sara Nóbrega. I'm an AI engineer specializing in MLOps and deploying machine learning systems to production.


References

[1] J. Liu, Y. Zhang, Z. Wang et al., Time Series Forecasting-Additional Returns (2025), arXiv preprint
Source:

[2] Conn Dis, TS-RAG: Time-Series Retrieval-Augmented Generation (nd), GitHub Repository
Source:

[3] Y. Zhang, H. Xu, X. Chen et al., Memory-Augmented Forecasting for Time Series and Random Events (2024), arXiv preprint
Source:

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button