Whenever a machine learning algorithm is implemented on a specific dataset, the performance is judged based on how well it generalizes, i.e how it reacts to new, never-before-seen data. In case the performance of the learning algorithm is not satisfactory or there is room for improvement, certain parameters in the algorithm need to be changed/tuned/tweaked. These parameters are known as ‘hyperparameters’ and the process of varying these hyperparameters to better the learning algorithm’s performance is known as ‘hyperparameter tuning’.
These hyperparameters are not learnt directly through the training of algorithms. These values are fixed before the training of the data begins. They deal with parameters such as learning_rate, i.e how quickly the model should be able to learn, how complicated the model is, and so on.
There can be a wide variety of hyperparameters for every learning algorithm. Selecting the right set of hyperparameters so as to gain good performance is an important aspect of machine learning.
In this post, we will look at the below-mentioned hyperparameter tuning strategies:
Before jumping into understanding how these two strategies work, let us assume that we will perform hyperparameter tuning on logistic regression algorithm and stochastic gradient descent algorithm.
RandomizedSearch searches for the specific subset of data in a random manner, instead of searching continuously (like how GridSearch does). This reduces the processing time of the hyperparameters. The scikit-learn module has RandomizedSearchCV function that can be used to implement random search. The hyperparameters are defined before searching them.
A parameter called ‘n_iter’ is used to specify the number of combinations that are randomly tried. If ‘n_iter’ is too less, finding the best combination is difficult, and if ‘n_iter’ is too large, the processing time increases. It is important to find a balanced value for ‘n_iter’:
import pandas as pd train = pd.read_csv("C:\\Users\\Vishal\\Desktop\\train.csv") test = pd.read_csv("C:\\Users\\Vishal\\Desktop\\test_1.csv") X_train = train.drop(['id', 'target'], axis=1) y_train = train['target'] X_test = test.drop(['id'], axis=1) loss = ['hinge', 'log', 'modified_huber', 'squared_hinge', 'perceptron'] penalty = ['l1', 'l2', 'elasticnet'] alpha = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] learning_rate = ['constant', 'optimal', 'invscaling', 'adaptive'] class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] eta0 = [1, 10, 100] param_distributions = dict(loss=loss, penalty=penalty, alpha=alpha, learning_rate=learning_rate, class_weight=class_weight, eta0=eta0) from sklearn.linear_model import SGDClassifier from sklearn.model_selection import RandomizedSearchCV sgd = SGDClassifier(loss="hinge", penalty="l2", max_iter=5) random = RandomizedSearchCV(estimator=sgd, param_distributions=param_distributions, scoring='roc_auc', verbose=1, n_jobs=-1, n_iter=1000) random_result = random.fit(X_train, y_train) print('Best Score: ', random_result.best_score_) print('Best Params: ', random_result.best_params_)
Output:
Best Score: 0.7981584905660377 Best Params: {'penalty': 'elasticnet', 'loss': 'log', 'learning_rate': 'optimal', 'eta0': 1, 'class_weight': {1: 0.5, 0: 0.5}, 'alpha': 0.1}
This approach is considered to be the traditional way of performing hyperparameter optimization. It searches the specific subset of the hyperparameters continuously until a condition is met or the end is reached. Scikit-learn module has the GridSearchCV that can be used to implement this approach.
Our set of parameters is defined before searching over it.
import pandas as pd train = pd.read_csv("path to train.csv") test = pd.read_csv("path to test_1.csv") X_train = train.drop(['id', 'target'], axis=1) y_train = train['target'] X_test = test.drop(['id'], axis=1) from sklearn.model_selection import GridSearchCV from sklearn.linear_model import LogisticRegression penalty = ['l1', 'l2'] C = [0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000] class_weight = [{1:0.5, 0:0.5}, {1:0.4, 0:0.6}, {1:0.6, 0:0.4}, {1:0.7, 0:0.3}] solver = ['liblinear', 'saga'] param_grid = dict(penalty=penalty, C=C, class_weight=class_weight, solver=solver) logistic = LogisticRegression() grid = GridSearchCV(estimator=logistic, param_grid=param_grid, scoring='roc_auc', verbose=1, n_jobs=-1) grid_result = grid.fit(X_train, y_train) print('Best Score: ', grid_result.best_score_) print('Best Params: ', grid_result.best_params_)
Output:
Best Score: 0.7860030747728861 Best Params: {'C': 1, 'class_weight': {1: 0.7, 0: 0.3}, 'penalty': 'l1', 'solver': 'liblinear'}
The advantage of using grid search is that it guarantees in finding an optimal combination from the parameters that are supplied to it.
The disadvantage is that it is time consuming when the size of the input dataset is large and it is computationally expensive. This can be overcome with the help of RandomSearch.
In this post, we understood the usage and significance of hyperparameter tuning, along with 2 important strategies which are used to tune the hyperparameters.
After reading your article, I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article. Thanks for sharing.
Good and informative article.
I enjoyed reading your articles. This is truly a great read for me. Keep up the good work!
Awesome blog. I enjoyed reading this article. This is truly a great read for me. Keep up the good work!
Thanks for sharing this article!! Machine learning is a branch of artificial intelligence (AI) and computer science that focus on the uses of data and algorithms. I came to know a lot of information from this article.
Leave a Reply
Your email address will not be published. Required fields are marked *