Generative AI

Create a Portrait Mode Effect with Segment Anything Model 2 (SAM2)

Have you ever admired how cell phone cameras isolate the main subject from the background, adding subtle background blur based on depth? This “portrait mode” effect gives photos a professional look by simulating a shallow depth of field similar to DSLR cameras. In this tutorial, we will recreate this effect programmatically using open source computer vision models, such as SAM2 from Meta and MiDaS from Intel ISL.

To build our pipeline, we will use:

  1. Segment Anything Model (SAM2): To separate the things you like and separate the foreground from the background.
  2. Depth Estimation Model: Calculating the depth map, enables depth-based blurring.
  3. Gaussian blur: Blur the background with varying intensity based on depth.

Step 1: Setting the Environment

To get started, install the following dependencies:

pip install matplotlib samv2 pytest opencv-python timm pillow

Step 2: Loads the target image

Select an image to use this effect and load it into Python using the A pillow the library.

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

image_path = ".jpg"
img = Image.open(image_path)
img_array = np.array(img)

# Display the image
plt.imshow(img)
plt.axis("off")
plt.show()

Step 3: Start SAM2

To run the model, download a pre-trained test environment. SAM2 offers four variants based on performance and decision speed: minimal, minimal, base_plus, and large. In this tutorial, we will use a little bit for a quick understanding.

Download the model checkpoint from: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_.pt

Replace it with the type of model you want.

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks

model = build_sam2(
    variant_to_config_mapping["tiny"],
    "sam2_hiera_tiny.pt",
)
image_predictor = SAM2ImagePredictor(model)

Step 4: Insert the image into SAM and select a Title

Set the image in SAM and assign points that lie on the subject you want to segment. SAM predicts a binary mask for the subject and domain.

image_predictor.set_image(img_array)
input_point = np.array([[2500, 1200], [2500, 1500], [2500, 2000]])
input_label = np.array([1, 1, 1])

masks, scores, logits = image_predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=None,
    multimask_output=True,
)
output_mask = show_masks(img_array, masks, scores)
sorted_ind = np.argsort(scores)[::-1]

Step 5: Run the Depth Estimation Model

To measure depth, we use MiDaS by Intel ISL. Similar to SAM, you can choose variants based on accuracy and speed.Be careful: The predicted depth map is reversed, meaning that larger values ​​correspond to closer objects. We will convert it in the next step for better understanding.

import torch
import torchvision.transforms as transforms

model_type = "DPT_Large"  # MiDaS v3 - Large (highest accuracy)

# Load MiDaS model
model = torch.hub.load("intel-isl/MiDaS", model_type)
model.eval()

# Load and preprocess image
transform = torch.hub.load("intel-isl/MiDaS", "transforms").dpt_transform
input_batch = transform(img_array)

# Perform depth estimation
with torch.no_grad():
    prediction = model(input_batch)
    prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=img_array.shape[:2],
        mode="bicubic",
        align_corners=False,
    ).squeeze()

prediction = prediction.cpu().numpy()

# Visualize the depth map
plt.imshow(prediction, cmap="plasma")
plt.colorbar(label="Relative Depth")
plt.title("Depth Map Visualization")
plt.show()

Step 6: Apply Gaussian Depth-Based Blurring

Here we implement depth-based blurring using an iterative Gaussian blurring method. Instead of using one large kernel, we use a small kernel multiple times for pixels with high depth values.

import cv2

def apply_depth_based_blur_iterative(image, depth_map, base_kernel_size=7, max_repeats=10):
    if base_kernel_size % 2 == 0:
        base_kernel_size += 1

    # Invert depth map
    depth_map = np.max(depth_map) - depth_map

    # Normalize depth to range [0, max_repeats]
    depth_normalized = cv2.normalize(depth_map, None, 0, max_repeats, cv2.NORM_MINMAX).astype(np.uint8)

    blurred_image = image.copy()

    for repeat in range(1, max_repeats + 1):
        mask = (depth_normalized == repeat)
        if np.any(mask):
            blurred_temp = cv2.GaussianBlur(blurred_image, (base_kernel_size, base_kernel_size), 0)
            for c in range(image.shape[2]):
                blurred_image[..., c][mask] = blurred_temp[..., c][mask]

    return blurred_image

blurred_image = apply_depth_based_blur_iterative(img_array, prediction, base_kernel_size=35, max_repeats=20)

# Visualize the result
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")

plt.subplot(1, 2, 2)
plt.imshow(blurred_image)
plt.title("Depth-based Blurred Image")
plt.axis("off")
plt.show()

Step 7: Join the background to the background

Finally, use a SAM mask to remove the sharp foreground and blend it with the blurred background.

def combine_foreground_background(foreground, background, mask):
    if mask.ndim == 2:
        mask = np.expand_dims(mask, axis=-1)
    return np.where(mask, foreground, background)

mask = masks[sorted_ind[0]].astype(np.uint8)
mask = cv2.resize(mask, (img_array.shape[1], img_array.shape[0]))
foreground = img_array
background = blurred_image

combined_image = combine_foreground_background(foreground, background, mask)

plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")

plt.subplot(1, 2, 2)
plt.imshow(combined_image)
plt.title("Final Portrait Mode Effect")
plt.axis("off")
plt.show()

The conclusion

With just a few tools, we recreated the portrait mode effect programmatically. This technique can be extended to photo editing applications, simulating camera effects, or creative projects.

Future Enhancements:

  1. Use edge detection algorithms to better enhance subject edges.
  2. Experiment with the kernel size to improve the blurring effect.
  3. Create a user interface to upload images and select topics dynamically.

Resources:

  1. Segment any model with META (
  2. A CPU-compatible implementation of SAM 2 (
  3. The MIDAs depth estimation model (


Vineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his BS at the Indian Institute of Technology (IIT), Kanpur. He is a machine learning enthusiast. He is interested in research and recent developments in Deep Learning, Computer Vision, and related fields.

📄 Meet 'Height': Independent project management tool (Sponsored)

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button