Effective hyperparameter optimization using Nelder-Mead method in deep learning

Table 1 Experimental setting for each method

Method	Detail
Random search	Perform 600 random evaluations.
Bayesian optimization	Initialize the observation data with the first 100 evaluations of the random search, then perform the optimization with exactly 500 evaluations. The kernel is the ARD Matérn 5/2 and the acquisition function is the EI [8, 10].
CMA-ES	Perform 600 evaluations with 20 generations where each generation consists of 30 individuals. \(\langle \mathbf {x} \rangle _{w}^{(0)} = 0.5\), σ ⁽⁰⁾=0.2. All variables are scaled to [0,1] [10].
Coordinate-search method	Initialize x ₀ as the best point of the first 100 random search evaluations, then perform optimization for up to 500 evaluations. α=0.5. All variables are scaled to [0,1].
Nelder-Mead method	Generate an initial simplex randomly, then perform optimization for up to 600 evaluations (including initialization). \(\gamma ^{s} = \frac {1}{2}, \delta ^{ic} = -\frac {1}{2}, \delta ^{oc} = \frac {1}{2}, \delta ^{r} = 1\ \text {and}\ \delta ^{e} = 2\).