How to Configure Algorithms

This page describes neural architecture search (NAS) algorithms that Katib supports and how to configure them.

NAS Algorithms

Katib currently supports several search algorithms for NAS:

Efficient Neural Architecture Search (ENAS)

The algorithm name in Katib is enas.

The ENAS example — enas-gpu.yaml — which attempts to show all possible operations. Due to the large search space, the example is not likely to generate a good result.

Katib supports the following algorithm settings for ENAS:

Setting NameTypeDefault valueDescription
controller_hidden_sizeint64RL controller lstm hidden size. Value must be >= 1.
controller_temperaturefloat5.0RL controller temperature for the sampling logits. Value must be > 0. Set value to "None" to disable it in the controller.
controller_tanh_constfloat2.25RL controller tanh constant to prevent premature convergence. Value must be > 0. Set value to "None" to disable it in the controller.
controller_entropy_weightfloat1e-5RL controller weight for entropy applying to reward. Value must be > 0. Set value to "None" to disable it in the controller.
controller_baseline_decayfloat0.999RL controller baseline factor. Value must be > 0 and <= 1.
controller_learning_ratefloat5e-5RL controller learning rate for Adam optimizer. Value must be > 0 and <= 1.
controller_skip_targetfloat0.4RL controller probability, which represents the prior belief of a skip connection being formed. Value must be > 0 and <= 1.
controller_skip_weightfloat0.8RL controller weight of skip penalty loss. Value must be > 0. Set value to "None" to disable it in the controller.
controller_train_stepsint50Number of RL controller training steps after each candidate runs. Value must be >= 1.
controller_log_every_stepsint10Number of RL controller training steps before logging it. Value must be >= 1.

Differentiable Architecture Search (DARTS)

The algorithm name in Katib is darts.

The DARTS example — darts-gpu.yaml.

Katib supports the following algorithm settings for DARTS:

Setting NameTypeDefault valueDescription
num_epochsint50Number of epochs to train model
w_lrfloat0.025Initial learning rate for training model weights. This learning rate annealed down to w_lr_min following a cosine schedule without restart.
w_lr_minfloat0.001Minimum learning rate for training model weights.
w_momentumfloat0.9Momentum for training training model weights.
w_weight_decayfloat3e-4Training model weight decay.
w_grad_clipfloat5.0Max norm value for clipping gradient norm of training model weights.
alpha_lrfloat3e-4Initial learning rate for alphas weights.
alpha_weight_decayfloat1e-3Alphas weight decay.
batch_sizeint128Batch size for dataset.
num_workersint4Number of subprocesses to download the dataset.
init_channelsint16Initial number of channels.
print_stepint50Number of training or validation steps before logging it.
num_nodesint4Number of DARTS nodes.
stem_multiplierint3Multiplier for initial channels. It is used in the first stem cell.

Next steps

Feedback

Was this page helpful?