Create a Portrait Mode Effect with Segment Anything Model 2 (SAM2)

Have you ever admired how cell phone cameras isolate the main subject from the background, adding subtle background blur based on depth? This “portrait mode” effect gives photos a professional look by simulating a shallow depth of field similar to DSLR cameras. In this tutorial, we will recreate this effect programmatically using open source computer vision models, such as SAM2 from Meta and MiDaS from Intel ISL.
To build our pipeline, we will use:
- Segment Anything Model (SAM2): To separate the things you like and separate the foreground from the background.
- Depth Estimation Model: Calculating the depth map, enables depth-based blurring.
- Gaussian blur: Blur the background with varying intensity based on depth.
Step 1: Setting the Environment
To get started, install the following dependencies:
pip install matplotlib samv2 pytest opencv-python timm pillow
Step 2: Loads the target image
Select an image to use this effect and load it into Python using the A pillow the library.
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
image_path = ".jpg"
img = Image.open(image_path)
img_array = np.array(img)
# Display the image
plt.imshow(img)
plt.axis("off")
plt.show()
Step 3: Start SAM2
To run the model, download a pre-trained test environment. SAM2 offers four variants based on performance and decision speed: minimal, minimal, base_plus, and large. In this tutorial, we will use a little bit for a quick understanding.
Download the model checkpoint from: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_
Replace it
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks
model = build_sam2(
variant_to_config_mapping["tiny"],
"sam2_hiera_tiny.pt",
)
image_predictor = SAM2ImagePredictor(model)
Step 4: Insert the image into SAM and select a Title
Set the image in SAM and assign points that lie on the subject you want to segment. SAM predicts a binary mask for the subject and domain.
image_predictor.set_image(img_array)
input_point = np.array([[2500, 1200], [2500, 1500], [2500, 2000]])
input_label = np.array([1, 1, 1])
masks, scores, logits = image_predictor.predict(
point_coords=input_point,
point_labels=input_label,
box=None,
multimask_output=True,
)
output_mask = show_masks(img_array, masks, scores)
sorted_ind = np.argsort(scores)[::-1]
Step 5: Run the Depth Estimation Model
To measure depth, we use MiDaS by Intel ISL. Similar to SAM, you can choose variants based on accuracy and speed.Be careful: The predicted depth map is reversed, meaning that larger values correspond to closer objects. We will convert it in the next step for better understanding.
import torch
import torchvision.transforms as transforms
model_type = "DPT_Large" # MiDaS v3 - Large (highest accuracy)
# Load MiDaS model
model = torch.hub.load("intel-isl/MiDaS", model_type)
model.eval()
# Load and preprocess image
transform = torch.hub.load("intel-isl/MiDaS", "transforms").dpt_transform
input_batch = transform(img_array)
# Perform depth estimation
with torch.no_grad():
prediction = model(input_batch)
prediction = torch.nn.functional.interpolate(
prediction.unsqueeze(1),
size=img_array.shape[:2],
mode="bicubic",
align_corners=False,
).squeeze()
prediction = prediction.cpu().numpy()
# Visualize the depth map
plt.imshow(prediction, cmap="plasma")
plt.colorbar(label="Relative Depth")
plt.title("Depth Map Visualization")
plt.show()
Step 6: Apply Gaussian Depth-Based Blurring
Here we implement depth-based blurring using an iterative Gaussian blurring method. Instead of using one large kernel, we use a small kernel multiple times for pixels with high depth values.
import cv2
def apply_depth_based_blur_iterative(image, depth_map, base_kernel_size=7, max_repeats=10):
if base_kernel_size % 2 == 0:
base_kernel_size += 1
# Invert depth map
depth_map = np.max(depth_map) - depth_map
# Normalize depth to range [0, max_repeats]
depth_normalized = cv2.normalize(depth_map, None, 0, max_repeats, cv2.NORM_MINMAX).astype(np.uint8)
blurred_image = image.copy()
for repeat in range(1, max_repeats + 1):
mask = (depth_normalized == repeat)
if np.any(mask):
blurred_temp = cv2.GaussianBlur(blurred_image, (base_kernel_size, base_kernel_size), 0)
for c in range(image.shape[2]):
blurred_image[..., c][mask] = blurred_temp[..., c][mask]
return blurred_image
blurred_image = apply_depth_based_blur_iterative(img_array, prediction, base_kernel_size=35, max_repeats=20)
# Visualize the result
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(blurred_image)
plt.title("Depth-based Blurred Image")
plt.axis("off")
plt.show()
Step 7: Join the background to the background
Finally, use a SAM mask to remove the sharp foreground and blend it with the blurred background.
def combine_foreground_background(foreground, background, mask):
if mask.ndim == 2:
mask = np.expand_dims(mask, axis=-1)
return np.where(mask, foreground, background)
mask = masks[sorted_ind[0]].astype(np.uint8)
mask = cv2.resize(mask, (img_array.shape[1], img_array.shape[0]))
foreground = img_array
background = blurred_image
combined_image = combine_foreground_background(foreground, background, mask)
plt.figure(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(combined_image)
plt.title("Final Portrait Mode Effect")
plt.axis("off")
plt.show()
The conclusion
With just a few tools, we recreated the portrait mode effect programmatically. This technique can be extended to photo editing applications, simulating camera effects, or creative projects.
Future Enhancements:
- Use edge detection algorithms to better enhance subject edges.
- Experiment with the kernel size to improve the blurring effect.
- Create a user interface to upload images and select topics dynamically.
Resources:
- Segment any model with META (
- A CPU-compatible implementation of SAM 2 (
- The MIDAs depth estimation model (
Vineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his BS at the Indian Institute of Technology (IIT), Kanpur. He is a machine learning enthusiast. He is interested in research and recent developments in Deep Learning, Computer Vision, and related fields.
📄 Meet 'Height': Independent project management tool (Sponsored)