model.deeplearn.optimizer package
Submodules
model.deeplearn.optimizer.centralized_adadelta module
- class model.deeplearn.optimizer.centralized_adadelta.CentralizedAdadelta(learning_rate=0.001, rho=0.95, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adadelta', **kwargs)
Bases:
AdadeltaStochastic gradient descent with dimension-wise adaptive learning rate. The optimizer behaves exactly like
keras.optimizers.Adadeltabut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_adagrad module
- class model.deeplearn.optimizer.centralized_adagrad.CentralizedAdagrad(learning_rate=0.001, initial_accumulator_value=0.1, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adagrad', **kwargs)
Bases:
AdagradStochastic gradient descent with parameter-specific learning rates. The optimizer behaves exactly like
keras.optimizers.Adagradbut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_adam module
- class model.deeplearn.optimizer.centralized_adam.CentralizedAdam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adam', **kwargs)
Bases:
AdamADAM optimizer with centralized gradients. The optimizer behaves exactly like
keras.optimizers.Adambut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
- static center_gradients(nablas)
Center the given gradients by subtracting the mean for each value in the last axis/dimension. For example, assume a tensor of gradients \(\mathcal{X} \in \mathbb{R}^{m \times n \times p \times q}\). First compute the mean vector for the last axis/dimension \(\pmb{\mu} \in \mathbb{R}^{q}\) such that \(\mu_l = (mnp)^{-1} \sum_{i}\sum_{j}\sum_{k}{x_{ijk}}\). Now, the centered gradient is given by the broadcast subtraction \(\mathcal{X}' = \mathcal{X} \ominus \pmb{\mu}\).
- Parameters:
nablas – The original gradients encoded as an \(\eta\)-dimensional tensor \(\mathcal{X} \in \mathbb{R}^{D_1 \times \ldots \times D_n}\).
- Returns:
The centered gradients encoded as a tensor with the same dimensionality as the input one.
model.deeplearn.optimizer.centralized_adamax module
- class model.deeplearn.optimizer.centralized_adamax.CentralizedAdamax(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adamax', **kwargs)
Bases:
AdamaxADAM optimizer using the infinity norm. The optimizer behaves exactly like
keras.optimizers.Adamaxbut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_adamw module
- class model.deeplearn.optimizer.centralized_adamw.CentralizedAdamW(learning_rate=0.001, weight_decay=0.004, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adamw', **kwargs)
Bases:
AdamWADAM optimizer with extra weight decay and centralized gradients. The optimizer behaves exactly like
keras.optimizers.AdamWbut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_ftrl module
- class model.deeplearn.optimizer.centralized_ftrl.CentralizedFTRL(learning_rate=0.001, learning_rate_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, l2_shrinkage_regularization_strength=0.0, beta=0.0, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='ftrl', **kwargs)
Bases:
FtrlFollow the regularized leader (FTRL) optimizer which behaves exactly like
keras.optimizers.ftrlbut with centered gradients.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_lamb module
- class model.deeplearn.optimizer.centralized_lamb.CentralizedLamb(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='lamb', **kwargs)
Bases:
LambStochastic gradient descent with layer-wise adaptive moments to tune the parameter-wise learning rate. The optimizer behaves exactly like
keras.optimizers.Lambbut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_lion module
- class model.deeplearn.optimizer.centralized_lion.CentralizedLion(learning_rate=0.001, beta_1=0.9, beta_2=0.99, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='lion', **kwargs)
Bases:
LionStochastic gradient descent using the sign operator to govern the magnitude of the update. The optimizer behaves exactly like
keras.optimizers.Lionbut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_nadam module
- class model.deeplearn.optimizer.centralized_nadam.CentralizedNadam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='nadam', **kwargs)
Bases:
NadamADAM optimizer with Nesterov momentum and centralized gradients. The optimizer behaves exactly like
keras.optimizers.Nadambut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_rmsprop module
- class model.deeplearn.optimizer.centralized_rmsprop.CentralizedRMSProp(learning_rate=0.001, rho=0.9, momentum=0.0, epsilon=1e-07, centered=False, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='rmsprop', **kwargs)
Bases:
RMSpropStochastic gradient descent optimizer with plain momentum that maintains a moving average of the square of gradients to normalize the gradient by the square root of the average. The optimizer behaves exactly like
keras.optimizers.RMSpropbut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
model.deeplearn.optimizer.centralized_sgd module
- class model.deeplearn.optimizer.centralized_sgd.CentralizedSGD(learning_rate=0.01, momentum=0.0, nesterov=False, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='SGD', **kwargs)
Bases:
SGDStochastic gradient descent optimizer with centralized gradients. The optimizer behaves exactly like
keras.optimizers.SGDbut its gradients are centered.- update_step(gradient, variable, learning_rate)
Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.
See
CentralizedAdam.center_gradients().- Returns:
Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.
Module contents
- author:
Alberto M. Esmoris Pena
The optimizer package contains the logic to handle custom optimizers for deep learning models.