Generative AI

Code Implementation of Monocular Monocular Memorized Monocular Model Used the Open Delab Model in Google Colab with Pyterch and OpenCV

Monocular depth including depth of incident from one RGB image – a basic function in computer manifestation with broad computer applications, including Augmented Reality, robotic, and local conditions. In this lesson, we use Intel Midas (a deep monocal estimate of the Multi-Scale Transformer's transformer), the state model designed for the highest predictions in one photo. Levering Google Colab as a Plute, and Pytorch, OpenCV, and matplotlib, this tuticle allows you to upload your photo and then see the idea of ​​easily compatible.

!pip install -q timm opencv-python matplotlib

First, we include the required Python libraries for supporting model, OpenCV-Python photography, and matplotlib to imagine the depth maps.

!git clone 
%cd MiDaS

Then, set the official Intel Midas area from GitTub and navigate to its indicator to access the model code and change services.

import torch
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from torchvision.transforms import Compose
from google.colab import files


from midas.dpt_depth import DPTDepthModel
from midas.transforms import Resize, NormalizeImage, PrepareForNet
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

We import all the required libraries and the Midas Library needed to load the model, images of using items, line repairs, and profound witnessing. Then we place a combination device on GPU (Cup) if available; Besides, it lowers CPU, to ensure compliance.

model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)
model = model_path.to(device)
model.eval()

Here, we download the Midas DPT_PTRAGRANT MODER from Intel's Torch.Hub, moves you to the selected device (CPU or GPU), and put it in detection mode.

transform = Compose([
    Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet()
])

It describes the Midas's Processing Pipeline's Pipeline's Piprocessing Pipeline, blaming the installation photo, measure its pixel values, and build it properly to reduce the model.

uploaded = files.upload()
for filename in uploaded:
    img = cv2.imread(filename)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    break

We allow the user to download the image to Colob, read it using OpenCV, and change it from BGR to the RGB format for accurate RGB.

img_input = transform({"image": img})["image"]
input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)


with torch.no_grad():
    prediction = model(input_tensor)
    prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=img.shape[:2],
        mode="bicubic",
        align_corners=False,
    ).squeeze()


depth_map = prediction.cpu().numpy()

Manje, sisebenzisa ukuguqulwa kwangaphambi kwesithombe esilayishiwe, sikuguqule sibe yisiqu, senze ukubikezela okujulile kusetshenziswa imodeli ye-midas, usayizi wokuphuma ukuze uqondanise ubukhulu besithombe sokuqala, bese ukhipha imephu yokujula kokugcina njenge-nunth array njenge-nunth array njenge-nunth array njenge-nunth array.

plt.figure(figsize=(10, 5))


plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")


plt.subplot(1, 2, 2)
plt.imshow(depth_map, cmap='inferno')
plt.title("Depth Map")
plt.axis("off")


plt.tight_layout()
plt.show()

Finally, we build visualizing – next to the original image and its corresponding depth map using matplotlib. Deeper Map is displayed using the Colermap 'Inferno' in order to differ better.

In conclusion, by completing this lesson, we successfully send the Intel's Midas model to Google Colab to make a monocular depth using RGB. Using the decorative model, OpenCV for the processing opters, and the matplotlib in the eye, form a strong pipe to produce high maps with less setup. This implementation is a solid basis for evaluation, including the limitations of the video depth, actual time, and the integration of AR / VR programs.


Here is the Colab Notebook. Also, don't forget to follow Sane and join ours Telegraph station including LinkedIn Grtopic. Don't forget to join ours 85k + ml subreddit.


Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button