A different Stochastic Equations and temperature – NE-SA PT weather data. 2

nimda September 3, 2025

0 14 7 minutes read

A different Stochastic Equations and temperature – NE-SA PT weather data. 2

Wonder is why we can want to model the temperature as an ornstein-Kunzenbech series, and if this has practical consequences. Yes, it does. While the OU process is used in physics to model the speed of particles under conflict, use and financial on interest rates, fluctuations, or distribution of interest. This temperature app comes from a place where I have a personal experience. My thesis, “Neal Network approaches the weathers, “focused on improving New ways of getting the prices of these steps.

The OU process is usually used in weather price. Financial removal designed to bring the fee based on the weather indicator of the weather (temperature, rain, windspeed, snow ice). These steps empower anyone who has shown weather risks to manage their risk disposal. That would be involved in drought, energy companies that are worried about the growing temperature, or some time traders are worried about the poor. You can learn a lot about this weather getting weather, Antonis Alexandridis K. (I work right now), and Achillel D. zapranis.

SDEs appear when not sure lies. This page Black Scholes, Schrönsha Equationsbesides Geometric brozi movement All different figures of diversity. These figures describe how the systems come from being involved in random part. SDES especially help predicate under uncertainty, can be so Growth, spread of the diseaseeither Stock Rates. Today, we will look at how we can imitate the temperature using the Orstin-Nlenbeck process, handling as little as Letting Inquation, PDE that describes FL temperature FLOw.

Graph Eye is a writer

In my two past issues:

The weather data is also available on my Github, so we can skip this step.

Today we will do:

Describe the OU process can save time for the time of time.
Talk about the background processes, stochastic variations, and how the OU process is related to the temperature equation.
Use Python to fit the ornstein-Kunzenbeck (Ou

In the number above, we see how the flexibility changes with the seasons. This emphasizes the important reason for OU process is eligible for modeling in this period of time, as it does not take changes.

OU PROCEDURE, means modification, and equation of heat

We will use the OU process to modify the temperature method that changes prematurely in a fixed area. In our equation, it

[dT

Kappa (κ) the speed of renewal. More like her topic, it can mean that this always controls how quickly our temperature is back to its seasonal ways. Refunds of this context means that if today is a LI day, it is more cold than this measure, then we can expect a week temperature as it changes as it does.

[
dT

This regularly plays the same role in the temperature, which describes how quickly the stick is growing in heat as warm. In equation equation,

[
frac{partial u(mathbf{x},t)}{partial t} = {color{red}{kappa}} , nabla^2 u(mathbf{x},t) + q(mathbf{x},t)
]

κ = 0: No good returns. The process reduces brownia movements with a drift.
0 <κ <1: Weaknesses straight to change. The shock continues, a slight decay of autocorrelation.
κ> 1: A strong conversion of strong. The deviation is repaired quickly, and the process is firmly integrated around μ.

The four series

Estimates also means flexibility

[sigma^2

It is worth the Orstein-Bennbeck's process in Mumbai data

We are equal to our temperature process

Agree with meaning using the four series
The model is a short period of short-term – and calculate the renewal parameter you say using the AR process (1)
Equity has been used in four series

Appropriate

Eightly for our data on 80% of the data, in time we can use our SDE in the combat of temperature. What does the relevant OLS regression S (T) in our data.

# --------------------
# Config (parameters and paths for analysis)
# --------------------

CITY        = "Mumbai, India"     # Name of the city being analyzed (used in labels/plots)
SEED        = 42                  # Random seed for reproducibility of results
SPLIT_FRAC  = 0.80                # Fraction of data to use for training (rest for testing)
MEAN_HARM   = 2                   # Number of harmonic terms to use for modeling the mean (seasonality)
VOL_HARM    = 3                   # Number of harmonic terms to use for modeling volatility (seasonality)
LJUNG_LAGS  = 10                  # Number of lags for Ljung-Box test (check autocorrelation in residuals)
EPS         = 1e-12               # Small value to avoid division by zero or log(0) issues
MIN_TEST_N  = 8                   # Minimum number of test points required to keep a valid test set

# --------------------
# Paths (where input/output files are stored)
# --------------------

# Base directory in Google Drive where climate data and results are stored
BASE_DIR    = Path("/content/drive/MyDrive/TDS/Climate")

# Subdirectory specifically for outputs of the Mumbai analysis
OUT_BASE    = BASE_DIR / "Benth_Mumbai"

# Subfolder for saving plots generated during analysis
PLOTS_DIR   = OUT_BASE / "plots"

# Ensure the output directories exist (create them if they don’t)
OUT_BASE.mkdir(parents=True, exist_ok=True)
PLOTS_DIR.mkdir(parents=True, exist_ok=True)

# Path to the climate data CSV file (input dataset)
CSV_PATH    = BASE_DIR / "climate_data.csv"

It is very difficult to see a sound signal when looking at the remains of our restoration. We can be tempted to think that this is a white sound, but it would be wrong. There is a dependence on this data, a serial dependence. If we wish to meet later we use the four series, we should report to this dependency depending on the data.

# =================================================================
# 1) MEAN MODEL: residuals after fitting seasonal mean + diagnostics
# =================================================================

# Residuals after subtracting fitted seasonal mean (before AR step)
mu_train = mean_fit.predict(train)
x_train = train["DAT"] - mu_train

# Save mean model regression summary to text file
save_model_summary_txt(mean_fit, DIAG_DIR / f"{m_slug}_mean_OLS_summary.txt")

# --- Plot: observed DAT vs fitted seasonal mean (train + test) ---
fig = plt.figure(figsize=(12,5))
plt.plot(train["Date"], train["DAT"], lw=1, alpha=0.8, label="DAT (train)")
plt.plot(train["Date"], mu_train, lw=2, label="μ̂

# Predict and plot fitted mean for test set
mu_test = mean_fit.predict(test[["trend"] + [c for c in train.columns if c.startswith(("cos","sin"))][:2*MEAN_HARM]])
plt.plot(test["Date"], test["DAT"], lw=1, alpha=0.8, label="DAT (test)")
plt.plot(test["Date"], mu_test, lw=2, label="μ̂

plt.title("Mumbai — DAT and seasonal mean fit")
plt.xlabel("Date"); plt.ylabel("Temperature (DAT)")
plt.legend()
fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_mean_fit_timeseries.png", dpi=160); plt.close(fig)

# --- Plot: mean residuals (time series) ---
fig = plt.figure(figsize=(12,4))
plt.plot(train["Date"], x_train, lw=1)
plt.axhline(0, color="k", lw=1)
plt.title("Mumbai — Residuals after mean fit (x_t = DAT - μ̂)")
plt.xlabel("Date"); plt.ylabel("x_t")
fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_mean_residuals_timeseries.png", dpi=160); plt.close(fig)

Φ Φ Φ However, and ongoing process

Adjust the ar process (1). This tells you the speed of renewal of what you say.

AR process (1) may catch depending on our residual residual time based on value on T-1. The exact amount of IVE depends on the place. In mumbai, κ = 0.861. This means that temperatures return to what is immediately.

[X_t = c + phi X_{t-1} + varepsilon_t, quad varepsilon_t sim mathcal{N}(0,sigma^2)]

# =======================================================
# 2) AR(1) MODEL: fit and residual diagnostics
# =======================================================

# Extract AR(1) parameter φ and save summary
phi = float(ar_fit.params.iloc[0]) if len(ar_fit.params) else np.nan
save_model_summary_txt(ar_fit, DIAG_DIR / f"{m_slug}_ar1_summary.txt")

# --- Scatterplot: x_t vs x_{t-1} with fitted line φ x_{t-1} ---
x_t = x_train.iloc[1:].values
x_tm1 = x_train.iloc[:-1].values
x_pred = phi * x_tm1
fig = plt.figure(figsize=(5.8,5.2))
plt.scatter(x_tm1, x_t, s=10, alpha=0.6, label="Observed")
xline = np.linspace(np.min(x_tm1), np.max(x_tm1), 2)
plt.plot(xline, phi*xline, lw=2, label=f"Fitted: x_t = {phi:.3f} x_(t-1)")
plt.title("Mumbai — AR(1) scatter: x_t vs x_{t-1}")
plt.xlabel("x_{t-1}"); plt.ylabel("x_t"); plt.legend()
fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_ar1_scatter.png", dpi=160); plt.close(fig)

# --- Time series: actual vs fitted AR(1) values ---
fig = plt.figure(figsize=(12,4))
plt.plot(train["Date"].iloc[1:], x_t, lw=1, label="x_t")
plt.plot(train["Date"].iloc[1:], x_pred, lw=2, label=r"$hat{x}_t = phi x_{t-1}$")
plt.axhline(0, color="k", lw=1)
plt.title("Mumbai — AR(1) fitted vs observed deviations")
plt.xlabel("Date"); plt.ylabel("x_t")
plt.legend()
fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_ar1_timeseries_fit.png", dpi=160); plt.close(fig)

# --- Save AR diagnostics (φ + Ljung-Box test p-value) ---
from statsmodels.stats.diagnostic import acorr_ljungbox
try:
    lb = acorr_ljungbox(e_train, lags=[10], return_df=True)
    lb_p = float(lb["lb_pvalue"].iloc[0])   # test for autocorrelation
    lb_stat = float(lb["lb_stat"].iloc[0])
except Exception:
    lb_p = np.nan; lb_stat = np.nan

pd.DataFrame([{"phi": phi, "ljungbox_stat_lag10": lb_stat, "ljungbox_p_lag10": lb_p}])
    .to_csv(DIAG_DIR / f"{m_slug}_ar1_diagnostics.csv", index=False)

In the graph above, we can see that our Art of our AR (1) process, estimates the Next temperature from 1981 to 2016. We can determine the use of the Ar (1) process, and we have received an IT discharge by planning Xₜ and Xₜ₋₁. Here we see how these amounts are clearly related. By adding it this way, and we can see that the Ar (1) process is just re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re- We do not even have to add a Drift / Incrept Term as we delete the meaning. Therefore, our Intercepti is always zero.

Now, looking at our remains of AR (1), we can start thinking about how we can imitate our temperature at the same time. From the lower graph, we can see how our remains tire at a time in a seemingly visible manner. We can send that score per year to high levels associated with periods of periods with high fluctuations.

# --- Residuals from AR(1) model ---
e_train = ar_fit.resid.astype(float).values
fig = plt.figure(figsize=(12,4))
plt.plot(train["Date"].iloc[1:], e_train, lw=1)
plt.axhline(0, color="k", lw=1)
plt.title("Mumbai — AR(1) residuals ε_t")
plt.xlabel("Date"); plt.ylabel("ε_t")
fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_ar1_residuals_timeseries.png", dpi=160); plt.close(fig)

Fouratiling Fourerian flexion

Our solution to model the variable is used by the four series, similar to what you have done for what you say. To do this, we will have to make some changes into ₜₜ building. We look for Squared ₜ²ₜ² because the flexibility cannot be wrong with the meaning.

[varepsilon_t = sigma_t eta_t qquad varepsilon_t^{2} = sigma_t^{2}eta_t^{2}]

We can imagine these residues as they are part of the Shocks Shocks Shocks ₜₜ, with a brutality of 0 and 1 (random representation), and the flexibility of σₜ. We want to distinguish σₜ. This can be easily done easily by taking the log. But most importantly, doing this makes our mistake possible.

[displaystyle y_t := log(varepsilon_t^2) = underbrace{logsigma_t^2}_{text{deterministic, seasonal}} + underbrace{logeta_t^2}_{text{random noise}}]

In short, the measurement model (ₜ²ₜ²) helps:

KINA σ ^ t₂> 0
Reduces the result that the merchants have in Fit, meaning they will have a very little impact on FIT.

After we are equal to the log (ₜ²ₜ²), if we wish to recover the ₜ²ₜ², it revealed it later.

[displaystyle widehat{sigma}_t^{,2} ;=; exp!big(widehat y_tbig)]

# =======================================================
# 3) VOLATILITY REGRESSION: fit log(ε_t^2) on seasonal harmonics
# =======================================================
if vol_fit is not None:
    # Compute log-squared residuals (proxy for variance)
    log_eps2 = np.log(e_train**2 + 1e-12)

    # Use cosine/sine harmonics as regressors for volatility
    feats = train.iloc[1:][[c for c in train.columns if c.startswith(("cos","sin"))][:2*VOL_HARM]]
    vol_terms = [f"{b}{k}" for k in range(1, VOL_HARM + 1) for b in ("cos","sin")]
    Xbeta = np.asarray(vol_fit.predict(train.iloc[1:][vol_terms]), dtype=float)

    # --- Time plot: observed vs fitted log-variance ---
    fig = plt.figure(figsize=(12,4))
    plt.plot(train["Date"].iloc[1:], log_eps2, lw=1, label="log(ε_t^2)")
    plt.plot(train["Date"].iloc[1:], Xbeta, lw=2, label="Fitted log-variance")
    plt.title("Mumbai — Volatility regression (log ε_t^2)")
    plt.xlabel("Date"); plt.ylabel("log variance")
    plt.legend()
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_volatility_logvar_timeseries.png", dpi=160); plt.close(fig)

    # --- Plot: estimated volatility σ̂
    fig = plt.figure(figsize=(12,4))
    plt.plot(train["Date"].iloc[1:], sigma_tr, lw=1.5, label="σ̂ (train)")
    if 'sigma_te' in globals():
        plt.plot(test["Date"], sigma_te, lw=1.5, label="σ̂ (test)")
    plt.title("Mumbai — Conditional volatility σ̂
    plt.xlabel("Date"); plt.ylabel("σ̂")
    plt.legend()
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_volatility_sigma_timeseries.png", dpi=160); plt.close(fig)

    # --- Coefficients of volatility regression (with CI) ---
    vol_coef = coef_ci_frame(vol_fit).sort_values("estimate")
    fig = plt.figure(figsize=(8, max(4, 0.4*len(vol_coef))))
    y = np.arange(len(vol_coef))
    plt.errorbar(vol_coef["estimate"], y, xerr=1.96*vol_coef["stderr"], fmt="o", capsize=3)
    plt.yticks(y, vol_coef["term"])
    plt.axvline(0, color="k", lw=1)
    plt.title("Mumbai — Volatility model coefficients (95% CI)")
    plt.xlabel("Estimate")
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_volatility_coefficients.png", dpi=160); plt.close(fig)

    # Save volatility regression summary
    save_model_summary_txt(vol_fit, DIAG_DIR / f"{m_slug}_volatility_summary.txt")
else:
    print("Volatility model not available (too few points or regression failed). Skipping vol plots.")

print("Diagnostics saved to:", DIAG_DIR)

[dT

Demo of LOG stability

The graphs below indicate how the logarithm AIDS is in proper conversion. Redidual ₜ²ₜ² contains important, partly traffickers due to the square operating of the beauty of the cut. Without the loop settlement, great shock is appropriate to the proper reign, and the ultimate result is prejudice.

# Fix the seasonal plot from the previous cell (indexing bug) and
# add a *second scenario* with rare large outliers to illustrate instability
# when regressing on skewed eps^2 directly.

# ------------------------------
# 1) Seasonal view (fixed indexing)
# ------------------------------
# Recreate the arrays from the prior simulation cell by re-executing the same seed & setup
np.random.seed(7)                 # deterministic reproducibility
n_days = 730                      # two synthetic years (365 * 2)
t = np.arange(n_days)             # day index 0..729
omega = 2 * np.pi / 365.0         # seasonal frequency (one-year period)

# True (data-generating) log-variance is seasonal via sin/cos
a_true, b_true, c_true = 0.2, 0.9, -0.35
log_var_true = a_true + b_true * np.sin(omega * t) + c_true * np.cos(omega * t)

# Convert log-variance to standard deviation: sigma = exp(0.5 * log var)
sigma_true = np.exp(0.5 * log_var_true)

# White noise innovations; variance scaled by sigma_true
z = np.random.normal(size=n_days)
eps = sigma_true * z              # heteroskedastic residuals
eps2 = eps**2                     # "variance-like" target if regressing raw eps^2

# Design matrix with intercept + annual sin/cos harmonics
X = np.column_stack([np.ones(n_days), np.sin(omega * t), np.cos(omega * t)])

# --- Fit A: OLS on raw eps^2 (sensitive to skew/outliers) ---
beta_A, *_ = np.linalg.lstsq(X, eps2, rcond=None)
var_hat_A = X @ beta_A            # fitted variance (can be negative from OLS)
var_hat_A = np.clip(var_hat_A, 1e-8, None)  # clip to avoid negatives
sigma_hat_A = np.sqrt(var_hat_A)  # convert to sigma

# --- Fit B: OLS on log(eps^2) (stabilizes scale & reduces skew) ---
eps_safe = 1e-12                  # small epsilon to avoid log(0)
y_log = np.log(eps2 + eps_safe)   # stabilized target
beta_B, *_ = np.linalg.lstsq(X, y_log, rcond=None)
log_var_hat_B = X @ beta_B
sigma_hat_B = np.exp(0.5 * log_var_hat_B)

# Day-of-year index for seasonal averaging across years
doy = t % 365

# Correct ordering for the first year's DOY values
order365 = np.argsort(doy[:365])

def seasonal_mean(x):
    """
    Average the two years day-by-day to get a single seasonal curve.
    Assumes x has length 730 (two years); returns length-365 array.
    """
    return 0.5 * (x[:365] + x[365:730])

# Plot one "synthetic year" view of the seasonal sigma pattern
plt.figure(figsize=(12, 5))
plt.plot(np.sort(doy[:365]), seasonal_mean(sigma_true)[order365], label="True sigma seasonality")
plt.plot(np.sort(doy[:365]), seasonal_mean(sigma_hat_A)[order365], label="Fitted from eps^2 regression", alpha=0.9)
plt.plot(np.sort(doy[:365]), seasonal_mean(sigma_hat_B)[order365], label="Fitted from log(eps^2) regression", alpha=0.9)
plt.title("Seasonal Volatility Pattern: True vs Fitted (one-year view) – Fixed")
plt.xlabel("Day of year")
plt.ylabel("Sigma")
plt.legend()
plt.tight_layout()
plt.show()

# ------------------------------
# 2) OUTLIER SCENARIO to illustrate instability
# ------------------------------
np.random.seed(21)                # separate seed for the outlier experiment

def run_scenario(n_days=730, outlier_rate=0.05, outlier_scale=8.0):
    """
    Generate two-year heteroskedastic residuals with occasional huge shocks
    to mimic heavy tails. Compare:
      - Fit A: regress raw eps^2 on sin/cos (can be unstable, negative fits)
      - Fit B: regress log(eps^2) on sin/cos (more stable under heavy tails)
    Return fitted sigmas and error metrics (MAE/MAPE), plus diagnostics.
    """
    t = np.arange(n_days)
    omega = 2 * np.pi / 365.0

    # Same true seasonal log-variance as above
    a_true, b_true, c_true = 0.2, 0.9, -0.35
    log_var_true = a_true + b_true * np.sin(omega * t) + c_true * np.cos(omega * t)
    sigma_true = np.exp(0.5 * log_var_true)

    # Base normal innovations
    z = np.random.normal(size=n_days)

    # Inject rare, huge shocks to create heavy tails in eps^2
    mask = np.random.rand(n_days) < outlier_rate
    z[mask] *= outlier_scale

    # Heteroskedastic residuals and their squares
    eps = sigma_true * z
    eps2 = eps**2

    # Same sin/cos design
    X = np.column_stack([np.ones(n_days), np.sin(omega * t), np.cos(omega * t)])

    # --- Fit A: raw eps^2 on X (OLS) ---
    beta_A, *_ = np.linalg.lstsq(X, eps2, rcond=None)
    var_hat_A_raw = X @ beta_A
    neg_frac = np.mean(var_hat_A_raw < 0.0)        # fraction of negative variance predictions
    var_hat_A = np.clip(var_hat_A_raw, 1e-8, None) # clip to ensure non-negative variance
    sigma_hat_A = np.sqrt(var_hat_A)

    # --- Fit B: log(eps^2) on X (OLS on log-scale) ---
    y_log = np.log(eps2 + 1e-12)
    beta_B, *_ = np.linalg.lstsq(X, y_log, rcond=None)
    log_var_hat_B = X @ beta_B
    sigma_hat_B = np.exp(0.5 * log_var_hat_B)

    # Error metrics comparing fitted sigmas to the true sigma path
    mae = lambda a, b: np.mean(np.abs(a - b))
    mape = lambda a, b: np.mean(np.abs((a - b) / (a + 1e-12))) * 100

    mae_A = mae(sigma_true, sigma_hat_A)
    mae_B = mae(sigma_true, sigma_hat_B)
    mape_A = mape(sigma_true, sigma_hat_A)
    mape_B = mape(sigma_true, sigma_hat_B)

    return {
        "t": t,
        "sigma_true": sigma_true,
        "sigma_hat_A": sigma_hat_A,
        "sigma_hat_B": sigma_hat_B,
        "eps2": eps2,
        "y_log": y_log,
        "neg_frac": neg_frac,
        "mae_A": mae_A, "mae_B": mae_B,
        "mape_A": mape_A, "mape_B": mape_B
    }

# Run with 5% outliers scaled 10x to make the point obvious
res = run_scenario(outlier_rate=0.05, outlier_scale=10.0)

print("nOUTLIER SCENARIO (5% of days have 10x shocks) — illustrating instability when using eps^2 directly")
print(f"  MAE  (sigma):  raw eps^2 regression = {res['mae_A']:.4f}   |   log(eps^2) regression = {res['mae_B']:.4f}")
print(f"  MAPE (sigma):  raw eps^2 regression = {res['mape_A']:.2f}% |   log(eps^2) regression = {res['mape_B']:.2f}%")
print(f"  Negative variance predictions before clipping (raw fit): {res['neg_frac']:.2%}")

# Visual comparison: true sigma vs two fitted approaches under outliers
plt.figure(figsize=(12, 5))
plt.plot(res["t"], res["sigma_true"], label="True sigma
plt.plot(res["t"], res["sigma_hat_A"], label="Fitted sigma from eps^2 regression", alpha=0.9)
plt.plot(res["t"], res["sigma_hat_B"], label="Fitted sigma from log(eps^2) regression", alpha=0.9)
plt.title("True vs Fitted Volatility with Rare Large Shocks")
plt.xlabel("Day")
plt.ylabel("Sigma")
plt.legend()
plt.tight_layout()
plt.show()

# Show how the targets behave when outliers are present
plt.figure(figsize=(12, 5))
plt.plot(res["t"], res["eps2"], label="eps^2 (now extremely heavy-tailed due to outliers)")
plt.title("eps^2 under outliers: unstable target for regression")
plt.xlabel("Day")
plt.ylabel("eps^2")
plt.legend()
plt.tight_layout()
plt.show()

plt.figure(figsize=(12, 5))
plt.plot(res["t"], res["y_log"], label="log(eps^2): compressed & stabilized")
plt.title("log(eps^2) under outliers: stabilized scale")
plt.xlabel("Day")
plt.ylabel("log(eps^2)")
plt.legend()
plt.tight_layout()
plt.show()

Store

Now that we have made all these categories in the SDE, we can imagine using it to predict. But get up here was not easy. Our solution was very dependent on the important two ideas.

Activities can be measured in four series.
Parameter of renewal which means it is equal to our Ar (1) process.

Whenever the appropriate equity of stochastic, whenever you do anything grassI find funny to think of how the word appears stokos, meaning for the purpose. Even a funnier is the way that name turned StockokoSthai, The meaning of purpose / guess. That process must have involved a mistake and a frightening purpose.