Reactive Machines

Consolidating custom leaning at Amazon Sagemaker Canvas FunzingsFlows

When using Mechanical Learning Machine's flow (ML) in Amazon Sagemaker Canvas, organizations may need to process the external reliance on their specific charges. Although the SAGENAKER CANVAS provides the strongest skills and low-speed test codes, some projects that may require special dependence on unwanted libraries in Sagemaker Canvas. This post gives an example of how to enter the code that rely on external deductions in your Sagemaker travel.

Amazon Sagemaker Canvas is a Code-Code-No Code (LCNC) platform (LCNC) directing users in all categories of ML trip, from the first data correction in the management of the last models. Besides writing a single code line, users can check the datasets, change data, construction models, and generate predictions.

Sagemaker Canvas provides full skills of data to help prepare your data, including:

  • More than 300 steps built in
  • Free contact feature
  • Regular data and cleaning activities
  • Custom Code Editor supporting Python, Pyspark and Sparkql

In this post, we show how we can add a dependent dependent on Amazon Storage Service (Amazon S3 within Amazon Sagemaker Data Wrangler Flow. Using this method, you can run Spript based on the supported modules of Sagemaker Canvas.

Looking for everything

To indicate the customs of the custom text and dependability from Amazon S3 in Sagemaker Canvas, we examine the following role.

Solution Follows three advanced steps:

  1. Enter custom texts and dependent on Amazon S3
  2. Use Sagemaker Data Wrangler in Sagemaker Canvas Convert your data using the uploaded code
  3. Train and send a model

The next drawing is a solution structure.

In this example, we interact with two corresponding details found in Sagemaker Canvas containing the computer screen delivery information. By joining these datasets, we create a complete data capturing various postetares and delivery results. Our goal is to create a predicate model that can determine that future shipment will come when it is based on patterns and historical symbols.

Requirements

As a requirement, you need to access the Amazon S3 and Amazon Sagemaker AI. If you do not have a Sagemaker Ai Domain prepared in your account, you also need SAGENAKER AI domain.

Create data flow

To create data flow, follow these steps:

  1. In Amazon Sagemaker Ai Console, On the Shipping Fosse, Less Applications and Deliciousselect FabricAs shown in the following screenshot. You may need to create a Sagemaker domain if you haven't already done so.
  2. After your domain created, select Open Canvas.SAGEMAKER CANVAS HOME PAME
  1. In Canvas, Select Data The tab and select the Logogs-Shipping-Shipping-Shipping-logs.csv, as shown in the following screenshot. After viewing first appear, select + Create data flow.Data flow creativity

The first data flow will be opened with one source and one type of data.

  1. On the top right of the screen, then select Add data → tabar. Designate Datas canvas As a source and select Canvas-Sampled-Product-Procture-Proctet-Dectets.Csv.
  2. Designate Next As shown in the following screenshot. And select Import.Data selection
  1. After two additional dattasets, select the consumer sign. From the drop menu, select Select Combine data. From the next drop-down menu, select Join.Join Datasets
  1. To make an internal joint in a compound column, in the right manuser, less Join the typeselect Internal Joining. Behind Join the keysselect Unimcaid irrdivotiveAs shown in the following screenshot.Join Datasets
  1. After combined datasets, select the consumer sign. In the drop-down menu, select + Insert change. Data preview will be opened.

The data contains the XshippingDillance (long) and YsshippingDillance (long). For our purposes, we want to use the custom work that will find the full distance using X and Y and drop columns that connect. For example, we find the full distance using a job that relates to the Mpmath library.

  1. To call a custom activity, choose + Insert change. In the drop-down menu, select Transforming custom. Change the editor to Python (Pandas) Also try to conduct the following work from edping Python:
from mpmath import sqrt  # Import sqrt from mpmath

def calculate_total_distance(df, x_col="XShippingDistance", y_col="YShippingDistance", new_col="TotalDistance"):

    # Use mpmath's sqrt to calculate the total distance for each row
    df[new_col] = df.apply(lambda row: float(sqrt(row[x_col]**2 + row[y_col]**2)), axis=1)
    
    # Drop the original x and y columns
    df = df.drop(columns=[x_col, y_col])
    
    return df

df = calculate_total_distance(df)

Running work produces the following error: ModuleNOntolontourderderer: No module of word 'MPMATH', as shown in the following screenshot.

Module Module error

This error arises because MPmath is not a module that is supported by Sagemaker Canvas. In order to use a function you rely on in the module, we need to be closer to using a custom activity differently.

Zip the script and dependent

In order to use a duty to be supported directly from Canvas, the customer script must be included in the Module (s). As a result of this example, we used our environmental integrated development environment (DE) creating Script.y relied on the MPMath library.

Script.py file contains two tasks: One function corresponding to Python (Pandas) Runtime (work calculate_total_distance), and one corresponding to Python (Pyspark) Runtime (work udf_total_distance).

def calculate_total_distance(df, x_col="XShippingDistance", y_col="YShippingDistance", new_col="TotalDistance"):
    from npmath import sqrt  # Import sqrt from npmath

    # Use npmath's sqrt to calculate the total distance for each row
    df[new_col] = df.apply(lambda row: float(sqrt(row[x_col]**2 + row[y_col]**2)), axis=1)

    # Drop the original x and y columns
    df = df.drop(columns=[x_col, y_col])

    return df

def udf_total_distance(df, x_col="XShippingDistance", y_col="YShippingDistance", new_col="TotalDistance"):
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import udf
    from pyspark.sql.types import FloatType

    spark = SparkSession.builder 
        .master("local") 
        .appName("DistanceCalculation") 
        .getOrCreate()

    def calculate_distance(x, y):
        import sys

        # Add the path to npmath
        mpmath_path = "/tmp/maths"
        if mpmath_path not in sys.path:
            sys.path.insert(0, mpmath_path)

        from mpmath import sqrt
        return float(sqrt(x**2 + y**2))

    # Register and apply UDF
    distance_udf = udf(calculate_distance, FloatType())
    df = df.withColumn(new_col, distance_udf(df[x_col], df[y_col]))
    df = df.drop(x_col, y_col)

    return df

To ensure that the text can run, enter the MPmath in the same cocktown in Script.py by working pip install mpmath.

Run zip -r my_project.zip To create a.zip file containing the function and the installation of MPMATH. The current directory is now contains a.ZIP file, our Python text, and our text submission depends on, as shown on the following screen.

Directory with zip file

Upload to Amazon S3

After creating a.zip file, download the baker S3.

Enter zip file to S3

After the ZIP file is loaded on Amazon S3, it is available at Sagemaker Canvas.

Run the custom text

Back to data fall in Sagemaker Canvas and replace the previous custom operating code and the following code and select Revise.

import zipfile
import boto3
import sys
from pathlib import Path
import shutil
import importlib.util


def load_script_and_dependencies(bucket_name, zip_key, extract_to):
    """
    Downloads a zip file from S3, unzips it, and ensures dependencies are available.

    Args:
        bucket_name (str): Name of the S3 bucket.
        zip_key (str): Key for the .zip file in the bucket.
        extract_to (str): Directory to extract files to.

    Returns:
        str: Path to the extracted folder containing the script and dependencies.
    """
    
    s3_client = boto3.client("s3")
    
    # Local path for the zip file
    zip_local_path="/tmp/dependencies.zip"
    
    # Download the .zip file from S3
    s3_client.download_file(bucket_name, zip_key, zip_local_path)
    print(f"Downloaded zip file from S3: {zip_key}")

    # Unzip the file
    try:
        with zipfile.ZipFile(zip_local_path, 'r') as zip_ref:
            zip_ref.extractall(extract_to)
            print(f"Extracted files to {extract_to}")
    except Exception as e:
        raise RuntimeError(f"Failed to extract zip file: {e}")

    # Add the extracted folder to Python path
    if extract_to not in sys.path:
      sys.path.insert(0, extract_to)
          
    return extract_to
    


def call_function_from_script(script_path, function_name, df):
    """
    Dynamically loads a function from a Python script using importlib.
    """
    try:
        # Get the script name from the path
        module_name = script_path.split('/')[-1].replace('.py', '')
        
        # Load the module specification
        spec = importlib.util.spec_from_file_location(module_name, script_path)
        if spec is None:
            raise ImportError(f"Could not load specification for module {module_name}")
            
        # Create the module
        module = importlib.util.module_from_spec(spec)
        sys.modules[module_name] = module
        
        # Execute the module
        spec.loader.exec_module(module)
        
        # Get the function from the module
        if not hasattr(module, function_name):
            raise AttributeError(f"Function '{function_name}' not found in the script.")
            
        loaded_function = getattr(module, function_name)

        # Clean up: remove module from sys.modules after execution
        del sys.modules[module_name]
        
        # Call the function
        return loaded_function(df)
        
    except Exception as e:
        raise RuntimeError(f"Error loading or executing function: {e}")


bucket_name="canvasdatabuckett"  # S3 bucket name
zip_key = 'functions/my_project.zip'  # S3 path to the zip file with our custom dependancy
script_name="script.py"  # Name of the script in the zip file
function_name="udf" # Name of function to call from our script
extract_to = '/tmp/maths' # Local path to our custom script and dependancies

# Step 1: Load the script and dependencies
extracted_path = load_script_and_dependencies(bucket_name, zip_key, extract_to)

# Step 2: Call the function from the script
script_path = f"{extracted_path}/{script_name}"
df = call_function_from_script(script_path, function_name, df)

This example discusses the.zip file and adds the required dependency on the location process to get to work during the implementation. Because MPMATH is added on the area, you can now call a job depending on the foreign library.

The preceding code operates using the Python (Pandas) Runtime and the Center Work Act. To use Python (Pyspark) Runtime, update a variable Function_Name to call UDF_TOTOTOL_DISTANCE instead.

Complete the data flow

As a final step, remove incorrect column before training the model. Follow these steps:

  1. In Sagemaker Canvas Console, Select + Insert change. From the drop menu, select Treat the columns
  2. Behind Changeselect Column Drop. Behind Columns that have droppedadd productid_0, Avoid_1, and order, as shown in the following screenshot.Columns that have dropped

The last data set should contain 13 columns. Full data flow is pictured in the next picture.

A complete data flow

Train the model

Training model, follow these steps:

  1. To the right top of the page, select Create a model And state your data and model.
  2. Designate Forecasting analysis as a type of problem and Otimeepreverever Such as the intended column, as shown on the screen below.Model Creation page

When a model optional model is formed to use the fastest formation or regular construction. Quick shape begins speed with accuracy and produces a trained model in less than 20 minutes. General construction prioritize accuracy with latency but model takes a long training.

Result

After model construction is completed, you can view model accuracy, and metrics such as F1, accuracy and memories. In the case of normal building, the model received 94.5% accuracy.

Model Page Stick

After exemplary training completed, there are four ways you can use your model:

  1. Add a model directly from Sagemaker Canvas to a teenage area
  2. Enter model in Sagemaker model
  3. Send your model to a Jobyter manual
  4. Send your model to Amazon QuackSight to be used in Dashboard Vializings

Clean

Managing Costs and Protecting Additional Works Costs, Select Exit To login with Sagemaker Canvas when you are finished using the app, as shown in the following screenshot. You can also configure the Sagemaker Canvas automatically close when they do nothing.

If you create a bucket of S3 specially for example, you may want to do not make your bucket off.

Sign in to Canvas

Summary

In this case, we showed how you can download custom leaning on Amazon S3 and compile it into Sagemaker Canvas work flow. By traveling with an effective example of using the process of calculations and Mpmath's library, indicate that:

  1. Package Code of Package and Depending on File of.ZIP
  2. Save and access this subject depends on Amazon S3
  3. Use Custom Data Conversion to Sagemaker Data Wrangler
  4. Train a guess model using modified data

This method means that data scientists and commentators can extend the Sagemaker Canvas skills above more than 300 applicable functions.

Trying to change customization yourself, refer to Amazon Sagemaker Canvas documents and log in to Sagemaker Canvas today. For more information on how you can increase your Sagemaker Canvas performance, we recommend testing these related posts:


About the writer

writer pictureNadhya Polanco Is the construction of resources associated with AWS based on Brussels, Belgium. In this passage, he supports the entities that want to include AI and the learning of the machine in their activities. In his free time, Nadhya enjoys involvement in her coffee disagreements and to explore new areas.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button