model.deeplearn.optimizer package

Submodules

model.deeplearn.optimizer.centralized_adadelta module

class model.deeplearn.optimizer.centralized_adadelta.CentralizedAdadelta(learning_rate=0.001, rho=0.95, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adadelta', **kwargs)

Bases: Adadelta

Stochastic gradient descent with dimension-wise adaptive learning rate. The optimizer behaves exactly like keras.optimizers.Adadelta but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_adagrad module

class model.deeplearn.optimizer.centralized_adagrad.CentralizedAdagrad(learning_rate=0.001, initial_accumulator_value=0.1, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adagrad', **kwargs)

Bases: Adagrad

Stochastic gradient descent with parameter-specific learning rates. The optimizer behaves exactly like keras.optimizers.Adagrad but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_adam module

class model.deeplearn.optimizer.centralized_adam.CentralizedAdam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adam', **kwargs)

Bases: Adam

ADAM optimizer with centralized gradients. The optimizer behaves exactly like keras.optimizers.Adam but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

static center_gradients(nablas)

Center the given gradients by subtracting the mean for each value in the last axis/dimension. For example, assume a tensor of gradients \(\mathcal{X} \in \mathbb{R}^{m \times n \times p \times q}\). First compute the mean vector for the last axis/dimension \(\pmb{\mu} \in \mathbb{R}^{q}\) such that \(\mu_l = (mnp)^{-1} \sum_{i}\sum_{j}\sum_{k}{x_{ijk}}\). Now, the centered gradient is given by the broadcast subtraction \(\mathcal{X}' = \mathcal{X} \ominus \pmb{\mu}\).

Parameters:: nablas – The original gradients encoded as an \(\eta\)-dimensional tensor \(\mathcal{X} \in \mathbb{R}^{D_1 \times \ldots \times D_n}\).
Returns:: The centered gradients encoded as a tensor with the same dimensionality as the input one.

model.deeplearn.optimizer.centralized_adamax module

class model.deeplearn.optimizer.centralized_adamax.CentralizedAdamax(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adamax', **kwargs)

Bases: Adamax

ADAM optimizer using the infinity norm. The optimizer behaves exactly like keras.optimizers.Adamax but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_adamw module

class model.deeplearn.optimizer.centralized_adamw.CentralizedAdamW(learning_rate=0.001, weight_decay=0.004, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adamw', **kwargs)

Bases: AdamW

ADAM optimizer with extra weight decay and centralized gradients. The optimizer behaves exactly like keras.optimizers.AdamW but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_ftrl module

class model.deeplearn.optimizer.centralized_ftrl.CentralizedFTRL(learning_rate=0.001, learning_rate_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, l2_shrinkage_regularization_strength=0.0, beta=0.0, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='ftrl', **kwargs)

Bases: Ftrl

Follow the regularized leader (FTRL) optimizer which behaves exactly like keras.optimizers.ftrl but with centered gradients.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_lamb module

class model.deeplearn.optimizer.centralized_lamb.CentralizedLamb(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='lamb', **kwargs)

Bases: Lamb

Stochastic gradient descent with layer-wise adaptive moments to tune the parameter-wise learning rate. The optimizer behaves exactly like keras.optimizers.Lamb but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_lion module

class model.deeplearn.optimizer.centralized_lion.CentralizedLion(learning_rate=0.001, beta_1=0.9, beta_2=0.99, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='lion', **kwargs)

Bases: Lion

Stochastic gradient descent using the sign operator to govern the magnitude of the update. The optimizer behaves exactly like keras.optimizers.Lion but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_nadam module

class model.deeplearn.optimizer.centralized_nadam.CentralizedNadam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='nadam', **kwargs)

Bases: Nadam

ADAM optimizer with Nesterov momentum and centralized gradients. The optimizer behaves exactly like keras.optimizers.Nadam but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_rmsprop module

class model.deeplearn.optimizer.centralized_rmsprop.CentralizedRMSProp(learning_rate=0.001, rho=0.9, momentum=0.0, epsilon=1e-07, centered=False, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='rmsprop', **kwargs)

Bases: RMSprop

Stochastic gradient descent optimizer with plain momentum that maintains a moving average of the square of gradients to normalize the gradient by the square root of the average. The optimizer behaves exactly like keras.optimizers.RMSprop but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

model.deeplearn.optimizer.centralized_sgd module

class model.deeplearn.optimizer.centralized_sgd.CentralizedSGD(learning_rate=0.01, momentum=0.0, nesterov=False, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='SGD', **kwargs)

Bases: SGD

Stochastic gradient descent optimizer with centralized gradients. The optimizer behaves exactly like keras.optimizers.SGD but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

Module contents

author:: Alberto M. Esmoris Pena

The optimizer package contains the logic to handle custom optimizers for deep learning models.