Today, give a try to Techtonique web app, a tool designed to help you make informed, data-driven decisions using Mathematics, Statistics, Machine Learning, and Data Visualization. Here is a tutorial with audio, video, code, and slides: https://moudiki2.gumroad.com/l/nrhgb
Due to the way it mixes several – relatively – simple concepts, Bayesian optimization (BO) is one of the most elegant mathematical tool I’ve encountered so far. In this post, I introduce GPopt
, a tool for BO that I implemented in Python (no technical docs yet, but coming soon). The examples of GPopt
’s use showcased here are based on Gaussian Processes (GP) and Expected Improvement (EI): what does that mean?
GPs are Bayesian statistical/machine learning (ML) models which create a distribution on functions, and especially on black-box functions. If we let f be the black-box – and expensive-to-evaluate – function whose minimum is searched, a GP is firstly adjusted (in a supervised learning way) to a small set of points at which f is evaluated. This small set of points is the initial design. Then the GP, thanks to its probabilistic nature, will exploit its knowledge of previous points at which f has already been evaluated, to explore new points: potential minimas maximizing an EI criterion. It’s a reinforcement learning question!
For more details on Bayesian optimization applied to hyperparameters calibration in ML, you can read Chapter 6 of this document. In this post, a Branin (2D) and a Hartmann (3D) functions will be used as examples of objective functions f, and Matérn 5/2 is the GP’s covariance.
Installing and importing the packages:
!pip install GPopt
!pip install matplotlib==3.1.3
import GPopt as gp
import time
import sys
import numpy as np
import matplotlib.pyplot as plt
from os import chdir
from scipy.optimize import minimize
The objective function to be minimized:
# branin function
def branin(x):
x1 = x[0]
x2 = x[1]
term1 = (x2 - (5.1*x1**2)/(4*np.pi**2) + (5*x1)/np.pi - 6)**2
term2 = 10*(1-1/(8*np.pi))*np.cos(x1)
return (term1 + term2 + 10)
# A local minimum for verification
print("\n")
res = minimize(branin, x0=[0, 0], method='Nelder-Mead', tol=1e-6)
print(res.x)
print(branin(res.x))
[3.14159271 2.27500017]
0.397887357729795
Firstly, I demonstrate the convergence of the optimizer towards a minimum (and show that the optimizer can pick up the procedure where it left):
# n_init is the number of points in the initial design
# n_iter is the number of iterations of the optimizer
gp_opt = gp.GPOpt(objective_func=branin,
lower_bound = np.array([-5, 0]),
upper_bound = np.array([10, 15]),
n_init=10, n_iter=10)
gp_opt.optimize(verbose=1)
Plotting the changes in expected improvement as a function of the number of iterations.
print(gp_opt.x_min) # current minimum
print(gp_opt.y_min) # current minimum
plt.plot(np.diff(gp_opt.max_ei))
[9.31152344 1.68457031]
0.9445903336427559
Adding more iterations to the optimizer:
gp_opt.optimize(verbose=1, n_more_iter=10)
...Done.
Optimization loop...
10/10 [██████████████████████████████] - 2s 186ms/step
(array([3.22692871, 2.63122559]), 0.6107733232129569)
Plotting the changes in expected improvement as a function of the number of iterations (again).
print(gp_opt.x_min) # current minimum
print(gp_opt.y_min) # current minimum
plt.plot(np.diff(gp_opt.max_ei))
[3.22692871 2.63122559]
0.6107733232129569
Adding more iterations to the optimizer (again):
gp_opt.optimize(verbose=1, n_more_iter=80)
Plotting the changes in expected improvement as a function of the number of iterations (again).
print(gp_opt.x_min) # current minimum
print(gp_opt.y_min) # current minimum
plt.plot(np.diff(gp_opt.max_ei))
[9.44061279 2.48199463]
0.3991320518189241
The 3 previous graphs suggest the possibility of stopping the optimizer earlier, when the algorithm is not improving on previous points’ results anymore:
# # early stopping
# abs_tol is the parameter that controls early stopping
gp_opt2 = gp.GPOpt(objective_func=branin,
lower_bound = np.array([-5, 0]),
upper_bound = np.array([10, 15]),
n_init=10, n_iter=190)
gp_opt2.optimize(verbose=2, abs_tol=1e-4)
print("\n")
We can observe that only 58 iterations are necessary when abs_tol = 1e-4
print(gp_opt2.n_iter)
print(gp_opt2.x_min)
print(gp_opt2.y_min)
58
[9.44061279 2.48199463]
0.3991320518189241
Illustrating convergence:
plt.plot(gp_opt2.max_ei)
We notice that in this example, GPopt falls into a local minimum but is very close to the previous minimum found with gradient-free optimizer (Nelder-Mead). The opposite situation can occur too:
# [0, 1]^3
def hartmann3(xx):
alpha = np.array([1.0, 1.2, 3.0, 3.2])
A = np.array([3.0, 10, 30,
0.1, 10, 35,
3.0, 10, 30,
0.1, 10, 35]).reshape(4, 3)
P = 1e-4 * np.array([3689, 1170, 2673,
4699, 4387, 7470,
1091, 8732, 5547,
381, 5743, 8828]).reshape(4, 3)
xxmat = np.tile(xx,4).reshape(4, 3)
inner = np.sum(A*(xxmat-P)**2, axis = 1)
outer = np.sum(alpha * np.exp(-inner))
return(-outer)
# Fails, but may work with multiple restarts
print("\n")
res = minimize(hartmann3, x0=[0, 0, 0], method='Nelder-Mead', tol=1e-6)
print(res.x)
print(hartmann3(res.x))
[0.36872308 0.11756145 0.26757363]
-1.00081686355956
# hartmann 3D
gp_opt4 = gp.GPOpt(objective_func=hartmann3,
lower_bound = np.repeat(0, 3),
upper_bound = np.repeat(1, 3),
n_init=20, n_iter=280)
gp_opt4.optimize(verbose=2, abs_tol=1e-4)
print(gp_opt4.n_iter)
print(gp_opt4.x_min)
print(gp_opt4.y_min)
print("\n")
51
[0.07220459 0.55792236 0.85662842]
-3.8600590626769904
The question is, in the case of BO applied to ML cross-validation, do we really want to find the global minimum of the objective function?
Comments powered by Talkyard.