Machine Learning with Python
Home
Basics
Python Ecosystem
Methods for Machine Learning
Data Loading for ML Projects
Understanding Data with Statistics
Understanding Data with Visualization
Preparing Data
Data Feature Selection
ML Algorithms − Classification
Introduction
Logistic Regression
Support Vector Machine(SVM)
Decision Tree
Naïve Bayes
Random Forest
ML Algorithms − Regression
Overview
Linear Regression
ML Algorithms − Clustering
Overview
K-Means Algorithm
Mean Shift Algorithm
Hierarchical Clustering
ML Algorithms − KNN Algorithm
Finding Nearest Neighbors
Performance Metrics
Automatic Workflows
Improving Performance of ML Models
Improving Performance of ML Model(contd..)

Improving Performance of ML Model(Contd..)

Performance Improvement with Algorithm Tuning

As we know that ML models are parameterized in such a way that their behavior can be adjusted for a specific problem. Algorithm tuning means finding the best combination of these parameters so that the performance of ML model can be improved. This process sometimes called hyperparameter optimization and the parameters of algorithm itself are called hyperparameters and coefficients found by ML algorithm are called parameters.

Performance Improvement with Algorithm Tuning

Here, we are going to discuss about some methods for algorithm parameter tuning provided by Python Scikit-learn.

Grid Search Parameter Tuning

It is a parameter tuning approach. The key point of working of this method is that it builds and evaluate the model methodically for every possible combination of algorithm parameter specified in a grid. Hence, we can say that this algorithm is having search nature.

Example

In the following Python recipe, we are going to perform grid search by using GridSearchCV class of sklearn for evaluating various alpha values for the Ridge Regression algorithm on Pima Indians diabetes dataset.

First, import the required packages as follows −

import numpy
from pandas import read_csv
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

Now, we need to load the Pima diabetes dataset as did in previous examples −

path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(path, names = headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]

Next, evaluate the various alpha values as follows;

alphas = numpy.array([1,0.1,0.01,0.001,0.0001,0])
param_grid = dict(alpha = alphas)

Now, we need to apply grid search on our model −

model = Ridge()
grid = GridSearchCV(estimator = model, param_grid = param_grid)
grid.fit(X, Y)

Print the result with following script line −

print(grid.best_score_)
print(grid.best_estimator_.alpha)

Output

0.2796175593129722
1.0

The above output gives us the optimal score and the set of parameters in the grid that achieved that score. The alpha value in this case is 1.0.

Random Search Parameter Tuning

It is a parameter tuning approach. The key point of working of this method is that it samples the algorithm parameters from a random distribution for a fixed number of iterations.

Example

In the following Python recipe, we are going to perform random search by using RandomizedSearchCV class of sklearn for evaluating different alpha values between 0 and 1 for the Ridge Regression algorithm on Pima Indians diabetes dataset.

First, import the required packages as follows −

import numpy
from pandas import read_csv
from scipy.stats import uniform
from sklearn.linear_model import Ridge
from sklearn.model_selection import RandomizedSearchCV

Now, we need to load the Pima diabetes dataset as did in previous examples −

path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(path, names=headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]

Next, evaluate the various alpha values on Ridge regression algorithm as follows −

param_grid = {'alpha': uniform()}
model = Ridge()
random_search = RandomizedSearchCV(
   estimator = model, param_distributions = param_grid, n_iter = 50, random_state=7)
random_search.fit(X, Y)

Print the result with following script line −

print(random_search.best_score_)
print(random_search.best_estimator_.alpha)

Output

0.27961712703051084
0.9779895119966027

The above output gives us the optimal score just similar to the grid search.

Previous Page Print Page