A Guide to Coding Property-Based Tests Using Hypothesis with Conditional, Differential, and Metamorphic Test Designs

nimda April 18, 2026

0 2 5 minutes read

A Guide to Coding Property-Based Tests Using Hypothesis with Conditional, Differential, and Metamorphic Test Designs

In this tutorial, we explore location-based testing using A hypothesis and build a robust test pipeline that goes beyond standard unit testing. We use dynamic, variable testing, metamorphic testing, target testing, and robust testing to ensure both the functional correctness and behavioral guarantees of our systems. Instead of manually generating edge cases, we let Hypothesis generate structured inputs, condense failures into smaller examples, and systematically uncover hidden bugs. Also, we show how modern testing processes can be integrated directly into the workflow of testing and conducting research.

import sys, textwrap, subprocess, os, re, math
!{sys.executable} -m pip -q install hypothesis pytest


test_code = r'''
import re, math
import pytest
from hypothesis import (
   given, assume, example, settings, note, target,
   HealthCheck, Phase
)
from hypothesis import strategies as st
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant, initialize, precondition


def clamp(x: int, lo: int, hi: int) -> int:
   if x < lo:
       return lo
   if x > hi:
       return hi
   return x


def normalize_whitespace(s: str) -> str:
   return " ".join(s.split())


def is_sorted_non_decreasing(xs):
   return all(xs[i] <= xs[i+1] for i in range(len(xs)-1))


def merge_sorted(a, b):
   i = j = 0
   out = []
   while i < len(a) and j < len(b):
       if a[i] <= b[j]:
           out.append(a[i]); i += 1
       else:
           out.append(b[j]); j += 1
   out.extend(a[i:])
   out.extend(b[j:])
   return out


def merge_sorted_reference(a, b):
   return sorted(list(a) + list(b))

Set up the environment by installing Hypothesis and pytest and import all required modules. We start building a complete test program by defining core resource functions such as clamp, normalize_whitespace, and merge_sorted. We are establishing a functional base upon which our location-based testing will firmly confirm us in the latest snippets.

def safe_parse_int(s: str):
   t = s.strip()
   if re.fullmatch(r"[+-]?d+", t) is None:
       return (False, "not_an_int")
   if len(t.lstrip("+-")) > 2000:
       return (False, "too_big")
   try:
       return (True, int
   except Exception:
       return (False, "parse_error")


def safe_parse_int_alt(s: str):
   t = s.strip()
   if not t:
       return (False, "not_an_int")
   sign = 1
   if t[0] == "+":
       t = t[1:]
   elif t[0] == "-":
       sign = -1
       t = t[1:]
   if not t or any(ch < "0" or ch > "9" for ch in t):
       return (False, "not_an_int")
   if len
       return (False, "too_big")
   val = 0
   for ch in t:
       val = val * 10 + (ord(ch) - 48)
   return (True, sign * val)


bounds = st.tuples(st.integers(-10_000, 10_000), st.integers(-10_000, 10_000)).map(
   lambda t: (t[0], t[1]) if t[0] <= t[1] else (t[1], t[0])
)


@st.composite
def int_like_strings(draw):
   sign = draw(st.sampled_from(["", "+", "-"]))
   digits = draw(st.text(alphabet=st.characters(min_codepoint=48, max_codepoint=57), min_size=1, max_size=300))
   left_ws = draw(st.text(alphabet=[" ", "t", "n"], min_size=0, max_size=5))
   right_ws = draw(st.text(alphabet=[" ", "t", "n"], min_size=0, max_size=5))
   return f"{left_ws}{sign}{digits}{right_ws}"


sorted_lists = st.lists(st.integers(-10_000, 10_000), min_size=0, max_size=200).map(sorted)

We use analytical logic and define systematic strategies that generate measurable, meaningful input. We create composite techniques like int_like_strings to precisely control the input space for validating the structure. We develop sorted list generators and parameterization techniques that allow for unique and consistent testing.

@settings(max_examples=300, suppress_health_check=[HealthCheck.too_slow])
@given(x=st.integers(-50_000, 50_000), b=bounds)
def test_clamp_within_bounds(x, b):
   lo, hi = b
   y = clamp(x, lo, hi)
   assert lo <= y <= hi


@settings(max_examples=300, suppress_health_check=[HealthCheck.too_slow])
@given(x=st.integers(-50_000, 50_000), b=bounds)
def test_clamp_idempotent(x, b):
   lo, hi = b
   y = clamp(x, lo, hi)
   assert clamp(y, lo, hi) == y


@settings(max_examples=250)
@given(s=st.text())
@example("   attb n c  ")
def test_normalize_whitespace_is_idempotent(s):
   t = normalize_whitespace(s)
   assert normalize_whitespace
   assert normalize_whitespace(" nt " + s + "  t") == normalize_whitespace(s)


@settings(max_examples=250, suppress_health_check=[HealthCheck.too_slow])
@given(a=sorted_lists, b=sorted_lists)
def test_merge_sorted_matches_reference(a, b):
   out = merge_sorted(a, b)
   ref = merge_sorted_reference(a, b)
   assert out == ref
   assert is_sorted_non_decreasing(out)

We describe a core structural test that ensures correctness and robustness across multiple functions. We use Hypothesis decorators to automatically test edge cases and verify behavioral guarantees such as bounds and deterministic normality. We also use different tests to ensure that our integration implementation is the same as the trusted reference.

@settings(max_examples=250, deadline=200, suppress_health_check=[HealthCheck.too_slow])
@given(s=int_like_strings())
def test_two_parsers_agree_on_int_like_strings(s):
   ok1, v1 = safe_parse_int(s)
   ok2, v2 = safe_parse_int_alt(s)
   assert ok1 and ok2
   assert v1 == v2


@settings(max_examples=250)
@given(s=st.text(min_size=0, max_size=200))
def test_safe_parse_int_rejects_non_ints(s):
   t = s.strip()
   m = re.fullmatch(r"[+-]?d+", t)
   ok, val = safe_parse_int(s)
   if m is None:
       assert ok is False
   else:
       if len(t.lstrip("+-")) > 2000:
           assert ok is False and val == "too_big"
       else:
           assert ok is True and isinstance(val, int)


def variance(xs):
   if len(xs) < 2:
       return 0.0
   mu = sum(xs) / len(xs)
   return sum((x - mu) ** 2 for x in xs) / (len(xs) - 1)


@settings(max_examples=250, phases=[Phase.generate, Phase.shrink])
@given(xs=st.lists(st.integers(-1000, 1000), min_size=0, max_size=80))
def test_statistics_sanity(xs):
   target(variance(xs))
   if len(xs) == 0:
       assert variance(xs) == 0.0
   elif len(xs) == 1:
       assert variance(xs) == 0.0
   else:
       v = variance(xs)
       assert v >= 0.0
       k = 7
       assert math.isclose(variance([x + k for x in xs]), v, rel_tol=1e-12, abs_tol=1e-12)

We extend our validation to the analysis of robustness and statistical validity using objective testing. We ensure that two independent number parsers agree on sorted input and enforce rejection rules on invalid strings. We also use the metamorphic test to ensure the invariance of the variables under the transformation.

class Bank:
   def __init__(self):
       self.balance = 0
       self.ledger = []


   def deposit(self, amt: int):
       if amt <= 0:
           raise ValueError("deposit must be positive")
       self.balance += amt
       self.ledger.append(("dep", amt))


   def withdraw(self, amt: int):
       if amt <= 0:
           raise ValueError("withdraw must be positive")
       if amt > self.balance:
           raise ValueError("insufficient funds")
       self.balance -= amt
       self.ledger.append(("wd", amt))


   def replay_balance(self):
       bal = 0
       for typ, amt in self.ledger:
           bal += amt if typ == "dep" else -amt
       return bal


class BankMachine(RuleBasedStateMachine):
   def __init__(self):
       super().__init__()
       self.bank = Bank()


   @initialize()
   def init(self):
       assert self.bank.balance == 0
       assert self.bank.replay_balance() == 0


   @rule(amt=st.integers(min_value=1, max_value=10_000))
   def deposit(self, amt):
       self.bank.deposit(amt)


   @precondition(lambda self: self.bank.balance > 0)
   @rule(amt=st.integers(min_value=1, max_value=10_000))
   def withdraw(self, amt):
       assume(amt <= self.bank.balance)
       self.bank.withdraw(amt)


   @invariant()
   def balance_never_negative(self):
       assert self.bank.balance >= 0


   @invariant()
   def ledger_replay_matches_balance(self):
       assert self.bank.replay_balance() == self.bank.balance


TestBankMachine = BankMachine.TestCase
'''


path = "/tmp/test_hypothesis_advanced.py"
with open(path, "w", encoding="utf-8") as f:
   f.write(test_code)


print("Hypothesis version:", __import__("hypothesis").__version__)
print("nRunning pytest on:", path, "n")


res = subprocess.run([sys.executable, "-m", "pytest", "-q", path], capture_output=True, text=True)
print(res.stdout)
if res.returncode != 0:
   print(res.stderr)


if res.returncode == 0:
   print("nAll Hypothesis tests passed.")
elif res.returncode == 5:
   print("nPytest collected no tests.")
else:
   print("nSome tests failed.")

We use a state-of-the-art system using Hypothesis's rule-based state machine to simulate a bank account. We define rules, conditions, and variables to ensure balance consistency and ledger integrity under random operating sequences. We then run the entire test suite through pytest, allowing Hypothesis to automatically find counterexamples and verify the program's correctness.

In conclusion, we have developed a comprehensive structure-based experimental framework that validates pure functions, analytical reasoning, mathematical behavior, and dynamic robust systems. We used the Reduction Hypothesis, guided search, and the power of state machine testing to move from example-based testing to behavior-driven validation. It allows us to think about accuracy with a high degree of abstraction while maintaining strong guarantees for edge cases and system consistency.

Check it out Complete Coding Notebook here. Also, feel free to follow us Twitter and don't forget to join our 130k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Need to work with us on developing your GitHub Repo OR Hug Face Page OR Product Release OR Webinar etc.? contact us

Source link

nimda April 18, 2026

0 2 5 minutes read