Definition

Hyperparameters are model configuration choices that are NOT learned from training data (unlike weights/parameters) — they must be set before training. Examples: learning rate, number of trees in Random Forest, C in SVM, number of layers in a neural network. Hyperparameter tuning finds the optimal combination using: Grid Search (exhaustive), Random Search (random samples — often better), Bayesian Optimisation (smart sequential search — best efficiency), or Optuna/Ray Tune (modern frameworks). Proper hyperparameter tuning can easily improve model performance by 5-20%.

Parameters vs hyperparameters

Type	Definition	Learned from data?	Examples
Model parameter	Internal variable optimised during training	Yes — gradient descent	Neural network weights, linear regression coefficients (β)
Hyperparameter	Configuration choice set before training	No — must be manually or automatically tuned	Learning rate, n_estimators, max_depth, dropout rate, C in SVM

Grid Search, Random Search and Bayesian Optimisation

Grid Search, Random Search, and Optuna Bayesian optimisation

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, cross_val_score
from sklearn.datasets import make_classification
from scipy.stats import randint, uniform
import numpy as np

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# ── METHOD 1: Grid Search (exhaustive — tries every combination) ──
param_grid = {
    'n_estimators': [50, 100, 200],        # 3 values
    'max_depth':    [5, 10, 20, None],      # 4 values
    'min_samples_split': [2, 5, 10],        # 3 values
    'max_features': ['sqrt', 'log2'],       # 2 values
}
# Total combinations: 3×4×3×2 = 72 models × 5-fold CV = 360 model fits!
grid_search = GridSearchCV(RandomForestClassifier(random_state=42),
    param_grid, cv=5, scoring='accuracy', n_jobs=-1, verbose=1)
grid_search.fit(X, y)
print(f"Grid Search best: {grid_search.best_params_}")
print(f"Grid Search score: {grid_search.best_score_:.3f}")

# ── METHOD 2: Random Search (random samples — often better efficiency) ──
param_dist = {
    'n_estimators':      randint(50, 500),       # Continuous range
    'max_depth':         [5, 10, 15, 20, None],
    'min_samples_split': randint(2, 20),
    'max_features':      uniform(0.1, 0.9),       # Fraction of features
    'min_samples_leaf':  randint(1, 10),
}
# n_iter=50: tries 50 random combinations — covers more hyperparameter space
random_search = RandomizedSearchCV(RandomForestClassifier(random_state=42),
    param_dist, n_iter=50, cv=5, scoring='accuracy', n_jobs=-1, random_state=42)
random_search.fit(X, y)
print(f"
Random Search best: {random_search.best_params_}")
print(f"Random Search score: {random_search.best_score_:.3f}")

# ── METHOD 3: Bayesian Optimisation with Optuna (most efficient) ──
try:
    import optuna
    optuna.logging.set_verbosity(optuna.logging.WARNING)

    def objective(trial):
        params = {
            'n_estimators':      trial.suggest_int('n_estimators', 50, 500),
            'max_depth':         trial.suggest_int('max_depth', 3, 30),
            'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
            'max_features':      trial.suggest_float('max_features', 0.1, 1.0),
            'min_samples_leaf':  trial.suggest_int('min_samples_leaf', 1, 10),
        }
        model = RandomForestClassifier(**params, random_state=42)
        return cross_val_score(model, X, y, cv=3, scoring='accuracy').mean()

    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=50, timeout=60)

    print(f"
Optuna best params: {study.best_params}")
    print(f"Optuna best score:  {study.best_value:.3f}")
    print(f"Total trials: {len(study.trials)}")

except ImportError:
    print("pip install optuna")

# ── METHOD 4: Halving Grid Search (resource-efficient) ──
from sklearn.model_selection import HalvingGridSearchCV
# Starts with many configs using small data, progressively eliminates poor ones
halving = HalvingGridSearchCV(RandomForestClassifier(random_state=42),
    param_grid, cv=3, factor=3, n_jobs=-1)
halving.fit(X, y)
print(f"
Halving Grid Search best: {halving.best_params_}")

AutoML — automatic machine learning

AutoML automates the full ML pipeline: feature engineering, algorithm selection, hyperparameter tuning, and ensemble construction. Major frameworks: Auto-sklearn (Bayesian optimisation over sklearn models), TPOT (genetic programming to evolve pipelines), H2O AutoML (production-grade), Optuna (flexible Bayesian HPO), NAS (Neural Architecture Search) for deep learning.

AutoML with auto-sklearn and H2O

# Auto-sklearn: automated sklearn pipeline + hyperparameter search
# pip install auto-sklearn (Linux/Mac)
try:
    import autosklearn.classification as automl
    automl_clf = automl.AutoSklearnClassifier(
        time_left_for_this_task=120,    # 2 minutes total
        per_run_time_limit=30,          # Max 30 seconds per model
        n_jobs=-1,
        ensemble_size=50
    )
    automl_clf.fit(X_train, y_train)
    print(f"AutoSklearn score: {automl_clf.score(X_test, y_test):.3f}")
    print(automl_clf.leaderboard())     # Shows all tried models ranked
except ImportError:
    print("auto-sklearn not available — try H2O or TPOT")

# H2O AutoML (pure Python, works everywhere)
# import h2o; from h2o.automl import H2OAutoML
# h2o.init()
# aml = H2OAutoML(max_models=20, max_runtime_secs=120, seed=42)
# aml.train(x=features, y=target, training_frame=train_h2o)
# print(aml.leaderboard)   # Top models by cross-validated AUC

# Key AutoML capabilities:
# - Algorithm selection (try RF, XGBoost, LightGBM, neural nets, etc.)
# - Hyperparameter optimisation (Bayesian or random search)
# - Feature preprocessing (scaling, encoding, imputation)
# - Ensemble construction (stacking top N models)
# - Model interpretability (SHAP values, feature importance)

# When to use AutoML vs manual tuning:
# AutoML: quick baseline, time-constrained, non-ML-expert user
# Manual: understand model behaviour, domain constraints, interpretability required

Practice questions

Grid Search has parameter grid with 5×4×3 = 60 combinations and uses 5-fold CV. How many model fits are performed? (Answer: 60 × 5 = 300 model fits. Each combination is evaluated on 5 different train/test splits. This is why Grid Search is expensive for large hyperparameter spaces.)
Why does Random Search often outperform Grid Search with the same number of evaluations? (Answer: Random Search explores a larger hyperparameter space. In a 2D grid where one dimension is important and one is irrelevant, Random Search tries n different values of the important dimension while Grid Search may repeat the same values of the important dimension for every value of the irrelevant one.)
What is Bayesian Optimisation and why is it more efficient than random search? (Answer: BO builds a probabilistic surrogate model (Gaussian Process) of the objective function and uses an acquisition function to choose the next hyperparameter configuration to try — balancing exploration (uncertain regions) and exploitation (known good regions). Uses information from previous evaluations to make intelligent next choices.)
Learning rate is a hyperparameter, but the neural network weights are parameters. What is the distinction? (Answer: Parameters are learned by optimisation (gradient descent updates weights to minimise loss). Hyperparameters control the learning process and must be set before training — they are not directly optimised by gradient descent on the training objective.)
You trained a model with the BEST hyperparameters from Grid Search on the validation set. Now you evaluate on the test set and performance drops. Why? (Answer: Overfitting to the validation set — the hyperparameters were optimised specifically for the validation data. Use nested cross-validation (outer loop for test evaluation, inner loop for hyperparameter tuning) to get an unbiased estimate of tuned model performance.)

On LumiChats

LumiChats can generate complete Optuna hyperparameter optimisation code for any sklearn or PyTorch model, suggest hyperparameter search spaces based on the algorithm, and explain why certain hyperparameters matter more than others for your specific problem.

Try it free

Hyperparameter Tuning — Grid Search, Random Search, Bayesian Optimisation

Parameters vs hyperparameters

Grid Search, Random Search and Bayesian Optimisation

AutoML — automatic machine learning

Practice questions

Try LumiChats for ₹69

Related Terms