As we know that ML models are parameterized in such a way that their behavior can be adjusted for a specific problem. Algorithm tuning means finding the best combination of these parameters so that the performance of ML model can be improved. This process sometimes called hyperparameter optimization and the parameters of algorithm itself are called hyperparameters and coefficients found by ML algorithm are called parameters.
Here, we are going to discuss about some methods for algorithm parameter tuning provided by Python Scikit-learn.
It is a parameter tuning approach. The key point of working of this method is that it builds and evaluate the model methodically for every possible combination of algorithm parameter specified in a grid. Hence, we can say that this algorithm is having search nature.
Example
In the following Python recipe, we are going to perform grid search by using GridSearchCV class of sklearn for evaluating various alpha values for the Ridge Regression algorithm on Pima Indians diabetes dataset.
First, import the required packages as follows −
import numpy from pandas import read_csv from sklearn.linear_model import Ridge from sklearn.model_selection import GridSearchCV
Now, we need to load the Pima diabetes dataset as did in previous examples −
path = r"C:\pima-indians-diabetes.csv" headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] data = read_csv(path, names = headernames) array = data.values X = array[:,0:8] Y = array[:,8]
Next, evaluate the various alpha values as follows;
alphas = numpy.array([1,0.1,0.01,0.001,0.0001,0]) param_grid = dict(alpha = alphas)
Now, we need to apply grid search on our model −
model = Ridge() grid = GridSearchCV(estimator = model, param_grid = param_grid) grid.fit(X, Y)
Print the result with following script line −
print(grid.best_score_) print(grid.best_estimator_.alpha)
Output
0.2796175593129722 1.0
The above output gives us the optimal score and the set of parameters in the grid that achieved that score. The alpha value in this case is 1.0.
It is a parameter tuning approach. The key point of working of this method is that it samples the algorithm parameters from a random distribution for a fixed number of iterations.
Example
In the following Python recipe, we are going to perform random search by using RandomizedSearchCV class of sklearn for evaluating different alpha values between 0 and 1 for the Ridge Regression algorithm on Pima Indians diabetes dataset.
First, import the required packages as follows −
import numpy from pandas import read_csv from scipy.stats import uniform from sklearn.linear_model import Ridge from sklearn.model_selection import RandomizedSearchCV
Now, we need to load the Pima diabetes dataset as did in previous examples −
path = r"C:\pima-indians-diabetes.csv" headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] data = read_csv(path, names=headernames) array = data.values X = array[:,0:8] Y = array[:,8]
Next, evaluate the various alpha values on Ridge regression algorithm as follows −
param_grid = {'alpha': uniform()} model = Ridge() random_search = RandomizedSearchCV( estimator = model, param_distributions = param_grid, n_iter = 50, random_state=7) random_search.fit(X, Y)
Print the result with following script line −
print(random_search.best_score_) print(random_search.best_estimator_.alpha)
Output
0.27961712703051084 0.9779895119966027
The above output gives us the optimal score just similar to the grid search.