src.model.deeplearn.optimizer.centralized_adam

Classes

CentralizedAdam([learning_rate, beta_1, ...])

ADAM optimizer with centralized gradients.

class src.model.deeplearn.optimizer.centralized_adam.CentralizedAdam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, weight_decay=None, clipnorm=None, clipvalue=None, global_clipnorm=None, use_ema=False, ema_momentum=0.99, ema_overwrite_frequency=None, loss_scale_factor=None, gradient_accumulation_steps=None, name='adam', **kwargs)

ADAM optimizer with centralized gradients. The optimizer behaves exactly like keras.optimizers.Adam but its gradients are centered.

update_step(gradient, variable, learning_rate)

Modify the gradients of the backbone optimizer by centering them before applying them to fit the model’s parameters.

See CentralizedAdam.center_gradients().

Returns:: Nothing at all, but the parameters are updated with the centered gradients instead of the original ones.

static center_gradients(nablas)

Center the given gradients by subtracting the mean for each value in the last axis/dimension. For example, assume a tensor of gradients \(\mathcal{X} \in \mathbb{R}^{m \times n \times p \times q}\). First compute the mean vector for the last axis/dimension \(\pmb{\mu} \in \mathbb{R}^{q}\) such that \(\mu_l = (mnp)^{-1} \sum_{i}\sum_{j}\sum_{k}{x_{ijk}}\). Now, the centered gradient is given by the broadcast subtraction \(\mathcal{X}' = \mathcal{X} \ominus \pmb{\mu}\).

Parameters:: nablas – The original gradients encoded as an \(\eta\)-dimensional tensor \(\mathcal{X} \in \mathbb{R}^{D_1 \times \ldots \times D_n}\).
Returns:: The centered gradients encoded as a tensor with the same dimensionality as the input one.