ANI

3-Parameter Tuning Techniques Go Beyond Grid Search

3-Parameter Tuning Techniques Go Beyond Grid Search
Photo by the Author

# Introduction

When building machine learning models of moderate to high complexity, there is a sufficient range of model parameters that are not learned from the data, but instead should be prioritized by us: these are known as hyperparameters. Models such as random forest clustering and neural networks have a variety of parameters that must be adjusted, so that each can take on one of many different values. As a result, the possible ways to configure even a small set of hyperparameters are almost infinite. This involves a problem: identifying the optimal configuration of these hyperparameters – i.e. the one(s) that provide the best model performance – may be like trying to find a needle in a haystack – or worse: in the ocean.

This article builds on the previous guide from Machine learning technology about the art of hyperparameter tuning, and uses a methodology to demonstrate the application of intermediate to advanced hyperparameter tuning techniques in practice.

Specifically, you will learn how to use these three hyperparameter tuning methods:

  • random search
  • bayesian optimization
  • half respectively

# Perform Initial Setup

Before starting, we'll import the required libraries and dependencies — if you get a “Module Not Found” error for any of these, make sure pip install the library mentioned first. We will be using it NumPy, scikit-learnagain Optuna:

import numpy as np
import time
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
import optuna
import warnings
warnings.filterwarnings('ignore')

We will also upload the dataset used in the three examples: Modified National Institute of Standards and Technology (MNIST)dataset for low-resolution image segmentation of handwritten digits.

print("=" * 70)
print("LOADING MNIST DATASET FOR IMAGE CLASSIFICATION")
print("=" * 70)

# Load digits dataset (lightweight version of MNIST: 8x8 images, 1797 samples)
digits = load_digits()
X, y = digits.data, digits.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Training instances: {X_train.shape[0]}")
print(f"Test instances: {X_test.shape[0]}")
print(f"Features: {X_train.shape[1]}")
print(f"Classes: {len(np.unique(y))}")
print()

Next, we define the hyperparameter search space; that is, we specify which parameters and subsets of values ​​within each we want to test together.

print("=" * 70)
print("HYPERPARAMETER SEARCH SPACE")
print("=" * 70)

# Typical hyperparameters to explore in a random forest ensemble
param_space = {
    'n_estimators': (10, 200),      # Number of trees
    'max_depth': (5, 50),            # Maximum tree depth
    'min_samples_split': (2, 20),   # Min samples to split node
    'min_samples_leaf': (1, 10),    # Min samples in leaf node
    'max_features': (0.1, 1.0)      # Fraction of features to consider
}

print("Search space:")
for param, bounds in param_space.items():
    print(f"  {param}: {bounds}")
print()

As a final preparation step, we define a function to be reused. It includes the process of training and testing a random forest mixture model under certain parameter settings, using cross-validation (CV) in conjunction with classification accuracy to determine the quality of the model. Note that this function may be called a large number of times with each of the three strategies we will use – as many as there are combinations of hyperparameter values ​​to try.

def evaluate_model(params, X_train, y_train, cv=3):
    # Instantiate a random forest model with given hyperparameters
    model = RandomForestClassifier(
        n_estimators=int(params['n_estimators']),
        max_depth=int(params['max_depth']),
        min_samples_split=int(params['min_samples_split']),
        min_samples_leaf=int(params['min_samples_leaf']),
        max_features=float(params['max_features']),
        random_state=42,
        n_jobs=-1  # Use all CPU cores for speed
    )
    
    # Use CV to measure performance
    # This gives us a more robust estimate than a single train/val split
    scores = cross_val_score(model, X_train, y_train, cv=cv, 
                             scoring='accuracy', n_jobs=-1)
    # Return the average cross-validation accuracy
    return np.mean(scores)

Now we are ready to try three strategies!

# Using random search

As its name suggests, random search in hyperparameter combination samples from the search space, rather than trying hard. everything possible combinations in a pre-defined search area, as does a grid search. All tests are independent, no knowledge gained from previous tests. However, this is the most effective method in most cases, often finding high-quality solutions faster than a grid search.

Here is how random search can be used and applied to random forest ensembles to classify MNIST data:

def randomized_search(n_trials=30):
    start_time = time.time() # Optional: used to measure execution time
    results = []
    
    print(f"nRunning {n_trials} random trials...")
    
    for i in range(n_trials):
        # RANDOM SAMPLING: hyperparameters are sampled independently using numpy's random number generation
        params = {
            'n_estimators': np.random.randint(param_space['n_estimators'][0], 
                param_space['n_estimators'][1]),
            'max_depth': np.random.randint(param_space['max_depth'][0], 
                param_space['max_depth'][1]),
            'min_samples_split': np.random.randint(param_space['min_samples_split'][0], 
                param_space['min_samples_split'][1]),
            'min_samples_leaf': np.random.randint(param_space['min_samples_leaf'][0], 
                param_space['min_samples_leaf'][1]),
            'max_features': np.random.uniform(param_space['max_features'][0], 
                param_space['max_features'][1])
        }
        
        # Evaluate a randomly defined configuration
        score = evaluate_model(params, X_train, y_train)
        results.append({'params': params, 'score': score})
        
        # Provide a progress update every 10 trials, for informative purposes
        if (i + 1) % 10 == 0:
            best_so_far = max(results, key=lambda x: x['score'])
            print(f"  Trial {i+1}/{n_trials}: Best score so far = {best_so_far['score']:.4f}")
    
    # Measure total time taken
    elapsed_time = time.time() - start_time
    
    # Identify best configuration found
    best_result = max(results, key=lambda x: x['score'])
    
    print(f"n✓ Completed in {elapsed_time:.2f} seconds")
    print(f"Best validation accuracy: {best_result['score']:.4f}")
    print(f"Best parameters: {best_result['params']}")
    
    return best_result, results

# Call the method to perform randomized search over 30 trials
random_best, random_results = randomized_search(n_trials=30)

Concepts are provided along with code for ease of understanding. The results obtained will be similar to the following:

Running 30 random trials...
  Trial 10/30: Best score so far = 0.9617
  Trial 20/30: Best score so far = 0.9617
  Trial 30/30: Best score so far = 0.9617

✓ Completed in 64.59 seconds
Best validation accuracy: 0.9617
Best parameters: {'n_estimators': 195, 'max_depth': 16, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 0.28306570555707966}

Note the time it took to run the hyperparameter search process, and the best validation accuracy achieved. In this case, it seems 10 tests are enough to find the right configuration.

# Using Bayesian Optimization

This method uses an auxiliary or surrogate model – in particular, a probabilistic model based on Gaussian processes or tree-based structures – to predict the most efficient hyperparameter settings. Trials are not independent; each trial “learns” from previous trials. Additionally, this approach tries to balance exploration (trying new areas in the solution space) and exploitation (refining promising areas). In short, we have a smarter approach than grid and random search.

I Optuna The library provides a special implementation of Bayesian optimization for hyperparameter tuning using the Tree-structured Parzen Estimator (TPE). It divides tests into “good” or “bad” groups, a probability distribution model across the board, and samples from promising regions.

The whole process can be done as follows:

def bayesian_optimization(n_trials=30):
    """
    Implementation of Bayesian optimization using Optuna library.
    """
    start_time = time.time()
    
    def objective(trial):
        """
        Optuna objective function: given a trial, returns a score.
        """
        # Optuna can suggest values based on past performance
        params = {
            'n_estimators': trial.suggest_int('n_estimators', 
                param_space['n_estimators'][0],
                param_space['n_estimators'][1]),
            'max_depth': trial.suggest_int('max_depth',
                param_space['max_depth'][0],
                param_space['max_depth'][1]),
            'min_samples_split': trial.suggest_int('min_samples_split',
                param_space['min_samples_split'][0],
                param_space['min_samples_split'][1]),
            'min_samples_leaf': trial.suggest_int('min_samples_leaf',
                param_space['min_samples_leaf'][0],
                param_space['min_samples_leaf'][1]),
            'max_features': trial.suggest_float('max_features',
                param_space['max_features'][0],
                param_space['max_features'][1])
        }
        
        # Evaluate and return score (maximizing by default in Optuna)
        return evaluate_model(params, X_train, y_train)
    
    # The create_study() function is used in Optuna to manage and run
    # the overall optimization process
    print(f"nRunning {n_trials} Bayesian optimization trials...")
    
    study = optuna.create_study(
        direction='maximize',  # We want to maximize accuracy
        sampler=optuna.samplers.TPESampler(seed=42)  # Bayesian algorithm
    )
    
    # Perform optimization process with progress callback
    def callback(study, trial):
        if trial.number % 10 == 9:
            print(f"  Trial {trial.number + 1}/{n_trials}: Best score = {study.best_value:.4f}")
    
    study.optimize(objective, n_trials=n_trials, callbacks=[callback], show_progress_bar=False)
    
    elapsed_time = time.time() - start_time
    
    print(f"n✓ Completed in {elapsed_time:.2f} seconds")
    print(f"Best validation accuracy: {study.best_value:.4f}")
    print(f"Best parameters: {study.best_params}")
    
    return study.best_params, study.best_value, study

bayesian_best_params, bayesian_best_score, bayesian_study = bayesian_optimization(n_trials=30)

Output (summary):

✓ Completed in 62.66 seconds
Best validation accuracy: 0.9673
Best parameters: {'n_estimators': 150, 'max_depth': 33, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 0.19145126698170384}

# Using Sequential Measurement

The last of these three methods, sequential halving, measures the size of the search space and computing resources allocated to the possible configurations. It starts with a sufficient list of configurations but limited resources (e.g. training data) for each configuration, gradually removing poor players and allocating more resources to promising configurations – similar to a real-world tournament where the strongest competitors “survive”.

The next implementation works by successive halving guided by gradually adjusting the size of the training set.

def successive_halving(n_initial=32, min_resource=0.25, max_resource=1.0):
    
    start_time = time.time()
    
    # Step 1: Defining initial hyperparameter configurations at random
    print(f"nGenerating {n_initial} initial random configurations...")
    configs = []
    for _ in range(n_initial):
        config = {
            'n_estimators': np.random.randint(param_space['n_estimators'][0], 
                param_space['n_estimators'][1]),
            'max_depth': np.random.randint(param_space['max_depth'][0], 
                param_space['max_depth'][1]),
            'min_samples_split': np.random.randint(param_space['min_samples_split'][0], 
                param_space['min_samples_split'][1]),
            'min_samples_leaf': np.random.randint(param_space['min_samples_leaf'][0], 
                param_space['min_samples_leaf'][1]),
            'max_features': np.random.uniform(param_space['max_features'][0], 
                param_space['max_features'][1])
        }
        configs.append(config)
    
    # Step 2: apply tournament-like successive rounds of elimination
    current_configs = configs
    current_resource = min_resource
    round_num = 1
    
    while len(current_configs) > 1 and current_resource <= max_resource:
        # Determine amount of training instances to use in the current round
        n_samples = int(len(X_train) * current_resource)
        print(f"n--- Round {round_num}: Evaluating {len(current_configs)} configs ---")
        print(f"    Using {current_resource*100:.0f}% of training data ({n_samples} samples)")
        
        # Subsample training instances
        indices = np.random.choice(len(X_train), size=n_samples, replace=False)
        X_subset = X_train[indices]
        y_subset = y_train[indices]
        
        # Evaluate all current configs with the current resources
        scores = []
        for i, config in enumerate(current_configs):
            score = evaluate_model(config, X_subset, y_subset, cv=2)  # Use cv=2 (minimum)
            scores.append(score)
            
            if (i + 1) % 10 == 0 or (i + 1) == len(current_configs):
                print(f"    Evaluated {i+1}/{len(current_configs)} configs...")
        
        # Elimination policy: keep top-performing half only
        n_keep = max(1, len(current_configs) // 2)
        sorted_indices = np.argsort(scores)[::-1]  # Descending order
        current_configs = [current_configs[i] for i in sorted_indices[:n_keep]]
        
        best_score = scores[sorted_indices[0]]
        print(f"    → Keeping top {n_keep} configs. Best score: {best_score:.4f}")
        
        # Update resources, doubling them for the next round
        current_resource = min(current_resource * 2, max_resource)
        round_num += 1
    
    # Final evaluation of best config found, given full training set
    best_config = current_configs[0]
    final_score = evaluate_model(best_config, X_train, y_train, cv=3)
    
    elapsed_time = time.time() - start_time
    
    print(f"n✓ Completed in {elapsed_time:.2f} seconds")
    print(f"Best validation accuracy: {final_score:.4f}")
    print(f"Best parameters: {best_config}")
    
    return best_config, final_score

halving_best, halving_score = successive_halving(n_initial=32, min_resource=0.25, max_resource=1.0)

The final result obtained may look like this:

✓ Completed in 56.18 seconds
Best validation accuracy: 0.9645
Best parameters: {'n_estimators': 158, 'max_depth': 39, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': 0.2269785516325355}

# Comparing Final Results

In summary, all three methods found the correct configuration with validation accuracy ranging between 96% and 97%, with Bayesian optimization achieving the best result by a small margin. The results are evident in efficiency, with successive halvings producing fast results in just over 56 seconds, compared to 62-64 seconds taken by the other two techniques.

Iván Palomares Carrascosa is a leader, author, speaker, and consultant in AI, machine learning, deep learning and LLMs. He trains and guides others in using AI in the real world.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button