src.model.deeplearn.layer.hourglass_layer

Classes

HourglassLayer(*args, **kwargs)

class src.model.deeplearn.layer.hourglass_layer.HourglassLayer(*args, **kwargs)
Author:

Alberto M. Esmoris Pena

An hourglass layer consists of two unbiased MLPs. The first one uses the weights \(\pmb{W_1} \in \mathbb{R}^{D_{\mathrm{in}} \times D_{\mathrm{h}}}\), to transform the features of \(m \in \mathbb{Z}_{>0}\) points \(\pmb{F} \in \mathbb{R}^{m \times D_{\mathrm{in}}}\) to a reduced feature space of \(D_{h} \in \mathbb{Z}_{>0}\) features. Then, the second unbiased MLP transforms the features from the reduced space to an output feature space of \(D_{\mathrm{out}} \in \mathbb{Z}_{>0}\) features using the weights \(\pmb{W_2} \in \mathbb{R}^{D_h \times D_{\mathrm{out}}}\).

The hourglass layer is a map \(\mathcal{H} : \mathbb{R}^{m \times D_{\mathrm{in}}} \to \mathbb{R}^{m \times D_{\mathrm{out}}}\) that can be mathematically summarized as:

\[\pmb{Y} = \sigma_2(\sigma_1(\pmb{F} \pmb{W_1})\pmb{W_2}) \in \mathbb{R}^{m \times D_{out}}\]

Where \(\sigma_i\) represents an activation function like a ReLU.

A regularization strategy is necessary to circumvent any potential information loss when mapping a high-dimensional feature space to a low-dimensional feature space. This problem can be addressed adding an extra term to the original loss function \(\mathcal{L}\) to obtain a new loss function \(\mathcal{L}' = \mathcal{L} + \beta \mathcal{L}_{h}\) where the hyperparameter \(\beta \in \mathbb{R}\) controls the contribution/impact of the hourglass regularization term to the final loss and:

\[\mathcal{L}_{h} = \left\lVert{ \dfrac{ \pmb{W_1}^{\intercal}\pmb{W_1} }{ \lVert\pmb{W_1}\rVert_2^2 } - \pmb{I} }\right\rVert_{F}\]

Where \(\lVert\cdot\rVert_2\) is the spectral norm of a matrix and \(\lVert\cdot\rVert_F\) is the Frobenius norm.

Further information about the hourglass layer can be read in the SFL-NET paper (https://doi.org/10.1109/TGRS.2023.3313876).

Variables:
  • Dh (int) – The requested internal dimensionality, i.e., \(D_{h}\).

  • Dout (int) – The requested output dimensionality, i.e., \(D_{\mathrm{out}}\).

  • activation (str) – The activation function \(\sigma_1\).

  • activation2 (str) – The activation function \(\sigma_2\).

  • regularize (bool) – Whether to apply \(+ \beta \mathcal{L}_h\) to the loss function or not.

  • beta (float) – The loss factor \(\beta\).

  • spectral_strategy (str) – The type of spectral strategy to be used. It can be either “unsafe” (might break during training), “safe” (will not break during training, but it will be twice slower), or “approx” (as fast as unsafe but less prone to break during training).

  • built_W1 (bool) – Whether the first matrix of weights \(\pmb{W_1} \in \mathbb{R}^{D_{\mathrm{in}} \times D_{h}}\) is built or not. Initially it is false, but it will be updated once the layer is built.

  • built_W2 (bool) – Whether the second matrix of weights \(\pmb{W_2} \in \mathbb{R}^{D_h \times D_{\mathrm{out}}}\) is built or not. Initially it is false, but it will be updated once the layer is built.

  • W1_initializer – The initializer for the first matrix of weights \(\pmb{W_1} \in \mathbb{R}^{D_{\mathrm{in}} \times D_{h}}\).

  • W1_regularizer – The regularizer for the first matrix of weights \(\pmb{W_1} \in \mathbb{R}^{D_{\mathrm{in}} \times D_{h}}\).

  • W1_constraint – The constraint for the first matrix of weights \(\pmb{W_1} \in \mathbb{R}^{D_{\mathrm{in}} \times D_{h}}\).

  • W2_initializer – The initializer for the second matrix of weights \(\pmb{W_2} \in \mathbb{R}^{D_h \times D_{\mathrm{out}}}\).

  • W2_regularizer – The regularizer for the second matrix of weights \(\pmb{W_2} \in \mathbb{R}^{D_h \times D_{\mathrm{out}}}\).

  • W2_constraint – The constraint for the second matrix of weights \(\pmb{W_2} \in \mathbb{R}^{D_h \times D_{\mathrm{out}}}\).

__init__(Dh, Dout, activation='relu', activation2='relu', regularize=True, spectral_strategy='approx', beta=0.1, built_W1=False, built_W2=False, W1_initializer=None, W1_regularizer=None, W1_constraint=None, W2_initializer=None, W2_regularizer=None, W2_constraint=None, sigma=None, sigma2=None, **kwargs)

See Layer and layer.Layer.__init__().

build(dim_in)

Build the \(\pmb{W_1} \in \mathbb{R}^{D_{\mathrm{in}} \times D_h}\) matrix representing the weights of the first MLP and the \(\pmb{W_2} \in \mathbb{R}^{D_h \times D_{\mathrm{out}}}\) matrix representing the weights of the second MLP.

See Layer and layer.Layer.build().

call(inputs, training=False, mask=False)

Compute:

\[\sigma_2\left({ \sigma_1(\pmb{F} \pmb{W_1}) \pmb{W_2} }\right)\]

See HourglassLayer for more details.

Parameters:

inputs – The feature space tensor representing the batch of structure space matrices.

Returns:

The output feature space tensor \(\mathcal{Y} \in \mathbb{R}^{K \times m \times D_{\mathrm{out}}}\). Besides, it might potentially add a term to the loss function.

do_hourglass_regularization()

Apply the hourglass regularization described in HourglassLayer.

do_no_regularization()

Do not apply any regularization at all.

compute_spectral_unsafe(cov)

Because the spectral norm is computed for the W_1^T W_1 matrix, it is guaranteed to be hermitian and positive semidefinite. Algebraically speaking, its eigenvalues can be derived with the eigh routine that uses heevd (hermitian eigenvalue decomposition) from LAPACK. However, sometimes, during training, the resulting matrix might present numerical issues causing the heevd routine to fail.

See HourglassLayer.compute_spectral_safe() and HourglassLayer.compute_spectral_approx().

compute_spectral_safe(cov)

A safe but two times slower alternative to the HourglassLayer.compute_spectral_unsafe() function. It uses the singular value decomposition approach instead of the eigh function. Consequently, it is robust to numerical issues and it will not fail during training.

See HourglassLayer.compute_spectral_unsafe() and HourglassLayer.compute_spectral_approx().

compute_spectral_approx(cov)

A compromise solution that has the speed of the unsafe alternative but is much less likely to break the training process. It considers \(\pmb{W_1}^T \pmb{W_1} + \lambda \pmb{I}\) with \(\lambda \to 0\) instead of \(\pmb{W_1}^T \pmb{W_1}\) to prevent numerical issues.

See HourglassLayer.compute_spectral_safe() and HourglassLayer.compute_spectral_unsafe().

get_config()

Return necessary data to serialize the layer

classmethod from_config(config)

Use given config data to deserialize the layer