src.model.deeplearn.layer.kpconvx_layer
Classes
|
- class src.model.deeplearn.layer.kpconvx_layer.KPConvXLayer(*args, **kwargs)
- Author:
Alberto M. Esmoris Pena
A kernel point convolution with attention (KPConvX) receives batches of \(R\) points with \(D_{\mathrm{in}}\) features each and \(\kappa\) known neighbors in the same space. These inputs are used to compute an output feature space of \(R\) points with \(D_{\text{in}}\) features each (i.e., the output dimensionality matches the input one). In doing so, the indexing tensor \(\mathcal{N} \in \mathbb{Z}^{B \times R \times \kappa}\) is used to link \(\kappa\) neighbors for each of the \(R\) input points in each of the \(B\) input batches.
The KPConvX layer is composed of many consecutive blocks. From now one, a single block is mathematically detailed for a single element in the batch.
The kernel point convolution with attention can be defined as an operator such that
\[\begin{split}\pmb{\tilde{f}_{i*}} = \left(\pmb{P} * \mathcal{Q}\right)(\pmb{x}_{i*}) & = \sum_{\pmb{x}_{j*} \in \mathcal{N}(\pmb{x}_{i*})}{ \max \; \left\{ 0, 1 - \dfrac{ \lVert \pmb{x}_{j*} - \pmb{x}_{i*} - \pmb{q}_{k_j*} \rVert }{ \sigma } \right\} \left( \pmb{a}_{k_j*} \bowtie \pmb{w}_{k_j*} \right) \odot \pmb{f}_{j*} } \\ & = \sum_{\pmb{x}_{j*} \in \mathcal{N}(\pmb{x}_{i*})}{ h_{ij} \bigl(\pmb{a}_{k_j*} \bowtie \pmb{w}_{k_j*}\bigr) \odot \pmb{f}_{j*} } .\end{split}\]Note that \(\mathcal{N}(\pmb{x}_{i*})\) is the neighborhood of the \(i\)-th input point, \(\pmb{q}_{k_j*}\) represents the kernel point that is closest to the \(j\)-th neighbor of the \(i\)-th input point, \(\bowtie\) is a grouped Hadamard product, \(\pmb{W} \in \mathbb{R}^{m_q \times D_{\text{in}}}\) is the matrix of weights (for \(m_q\) kernel points and \(D_{\text{in}}\) features), and
\[\alpha(\pmb{f}_{i*}) = \sigma_{\alpha} \biggl( \sigma_{\phi} \Bigl( \mathcal{Z}_{\phi} \bigl( \pmb{f}_{i*} \pmb{\Phi} \oplus \pmb{\phi} \bigr) \Bigr) \pmb{\Psi} \oplus \pmb{\psi} \biggr) \in \mathbb{R}^{m_q D_{\text{in}}/G} ,\]which leads to the matrix \(\pmb{A} \in \mathbb{R}^{m_q \times D_{\text{in}}/G}\) with rows
\[\pmb{a}_{k*} = \alpha(\pmb{f}_{i*})((k-1)D_{\text{in}}/G : kD_{\text{in}}/G) .\]Note that the rows \(\pmb{\tilde{f}}_{i*} \in \mathbb{R}^{D_{\text{in}}}\) lead to the hidden feature space matrix \(\pmb{\widetilde{F}} \in \mathbb{R}^{R \times D_{\text{in}}}\).
In the equation of the vector of modulations \(\alpha(\pmb{f}_{i*}) \in \mathbb{R}^{m_q D_{\text{in}}/G}\) for a group size \(G \in \mathbb{Z}_{>0}\) satisfying \(G \mid D_{\text{in}}\), \(\mathcal{Z}_{\phi}\) represents batch normalization, \(\pmb{\Phi} \in \mathbb{R}^{D_{\text{in}} \times D_{\text{in}}}\) is a matrix of weights, \(\pmb{\phi} \in \mathbb{R}^{D_\text{in}}\) is its corresponding vector of weights, \(\pmb{\Psi} \in \mathbb{R}^{D_{\text{in}} \times m_q D_{\text{in}}/G}\) is another matrix of weights, and \(\pmb{\psi} \in \mathbb{R}^{m_q D_{\text{in}}/G}\) is its corresponding vector of weights, \(\sigma_{\phi}\) is an activation function (typically a leaky ReLU), and \(\sigma_{\alpha}\) is a sigmoid function.
For the sake of understanding, let us look at the following example about how to compute a grouped Hadamard product for \(D_{\text{in}}=6\) input features and a group size of \(G=2\):
\[(a_1, a_2, a_3) \bowtie (f_1, f_2, f_3, f_4, f_5, f_6) = (a_1f_1, a_1f_2, a_2f_3, a_2f_4, a_3f_5, a_3f_6)\]Now, the KPConvX operator is followed by an upsampling MLP
\[\pmb{\tilde{h}}(\pmb{\tilde{f}}_{i*}) = \sigma_{\tilde{h}}\Bigl(\mathcal{Z}_{\tilde{h}}\bigl( \pmb{\tilde{f}}_{i*} \pmb{\widetilde{A}} + \pmb{\tilde{b}} \bigr)\Bigr) \leadsto \pmb{\widetilde{H}} = \sigma_{\tilde{h}}\Bigl( \mathcal{Z}_{\tilde{h}}\bigl( \pmb{\widetilde{F}} \pmb{\widetilde{A}} \oplus \pmb{\tilde{b}} \bigr)\Bigr) \in \mathbb{R}^{R \times D_H}\]where \(\pmb{\widetilde{A}} \in \mathbb{R}^{D_{\text{in}} \times D_H}\) is the matrix of weights, \(\pmb{\tilde{b}} \in \mathbb{R}^{D_H}\) is the vector of weights, \(\mathcal{Z}_{\tilde{h}}\) represents batch normalization, \(\sigma_{\tilde{h}}\) is an activation function (typically a leaky ReLU), and \(D_H \in \mathbb{Z}_{>0}\) is the dimensionality of the hidden feature space which is set typically to \(4 D_{\text{in}}\).
Then, a downsampling MLP restores the input dimensionality (that will be also the output dimensionality) such that
\[\pmb{\hat{h}}(\pmb{\tilde{h}}_{i*}) = \sigma_{\hat{h}}\Bigl(\mathcal{Z}_{\hat{h}}\bigl( \pmb{\tilde{h}}_{i*} \pmb{\widehat{A}} + \pmb{\hat{b}} \bigr)\Bigr) \leadsto \pmb{\widehat{H}} = \sigma_{\hat{h}}\Bigl( \mathcal{Z}_{\hat{h}}\bigl( \pmb{\widetilde{H}} \pmb{\widehat{A}} \oplus \pmb{\hat{b}} \bigr)\Bigr) \in \mathbb{R}^{R \times D_{\text{in}}} ,\]where \(\pmb{\widehat{A}} \in \mathbb{R}^{D_H \times D_{\text{in}}}\) is the matrix of weights, \(\pmb{\hat{b}} \in \mathbb{R}^{D_{\text{in}}}\) is the vector of weights, \(\mathcal{Z}_{\hat{h}}\) represents batch normalization, and \(\sigma_{\hat{h}}\) is an activation function (typically a leaky ReLU).
Finally, the output feature space for a single block will sum the input as a residual contribution to the downsampled hidden feature space \(\pmb{\widehat{H}} \in \mathbb{R}^{R \times D_{\text{in}}}\) as follows
\[\pmb{\widehat{F}} = \pmb{F} + \pmb{\widehat{H}}\]Further details on the KPConvX operator can be read in the corresponding paper (https://doi.org/10.48550/arXiv.2405.13194).
- Variables:
blocks (int) – The number of blocks in the layer.
drop_path (float) – The probability inside \([0, 1]\) of dropping a given block. Note that if zero is given, no drop path will b applied at all.
sigma (float) – The influence distance for each kernel point.
shell_radii (list or
np.ndarrayof float) – The radius for each spherical shell.shell_points (list or
np.ndarrayof int) – The number of points at each spherical shell.mq (int) – The number of kernel points.
DH (int) – The number of hidden channels.
groups (int) – The number of groups \(G\). Note that it must divide the number of input features.
deformable (bool) – Whether the structure space of the kernel (i.e., the kernel points) can be optimized or not.
Q_initializer (
FibonacciShellInitializer) – The initializer for the kernel’s structure space.initializer – The initializer for each tensor of weights.
regularizer – The regularizer for each tensor of weights.
constraint – The constraint for each tensor of weights.
sigmoid – The sigmoid activation function.
activation – The activation layer (typically Leaky ReLU).
bn_momentum (float) – The momentum for the batch normalization layers.
bn_phi (list of
keras.layers.BatchNormalization) – The batch normalization layers for the computation of alpha.bn_htilde (list of
keras.layers.BatchNormalization) – The batch normalization layers for the computation of the upsampling MLP.bn_hhat (list of
keras.layers.BatchNormalization) – The batch normalization layers for the computation of the downsampling MLP.built_Q (bool) – Whether the structure space matrix \(\pmb{Q} \in \mathbb{R}^{m_q \times n_x}\) is built. Initially it is false, but it will be updated once the layer is built.
built_W (bool) – Whether the tensors of weights for each kernel point at each depth \(\mathcal{W} \in \mathbb{R}^{\text{blocks} \times m_q \times D_{\text{in}}}\) are built.
built_Phi (bool) – Whether the tensor of matrix-like weights for the first MLP in the alpha function \(\mathcal{\Phi} \in \mathbb{R}^{\text{blocks} \times D_{\text{in}} \times D_{\text{in}}}\) are built.
built_phi (bool) – Whether the tensor of vector-like weights for the first MLP in the alpha function \(\mathcal{\phi} \in \mathbb{R}^{\text{blocks} \times D_{\text{in}}}\) are built.
built_Psi (bool) – Whether the tensor of matrix-like weights for the second MLP in the alpha function \(\Psi \in \mathbb{R}^{\text{blocks} \times D_{in} \times m_q D_{\text{in}}/G}\) are built.
built_psi (bool) – Whether the tensor of vector-like weights for the second MLP in the alpha function \(\psi \in \mathbb{R}^{\text{blocks} \times m_q D_{\text{in}}/G}\) are built.
built_Atilde (bool) – Whether the tensor of matrix-like weights for the upsampling MLP \(\mathcal{\widetilde{A}} \in \mathbb{R}^{\text{blocks} \times D_{\text{in}} \times D_{\text{in}}}\) are built.
built_btilde (bool) – Whether the tensor of vector-like weights for the upsampling MLP \(\tilde{b} \in \mathbb{R}^{\text{\blocks} \times D_H}\) are built.
built_Ahat (bool) – Whether the tensor of matrix-like weights for the downsampling MLP \(\mathcal{\widehat{A}} \in \mathbb{R}^{\text{blocks} \times D_H \times D_{\text{in}}}\) are built.
built_bhat (bool) – Whether the tensor of vector-like weights for the downsampling MLP \(\hat{b} \in \mathbb{R}^{\text{blocks} \times D_{\text{in}}}\)
- __init__(blocks=1, drop_path=0, sigma=1.0, shell_radii=[0, 0.5, 1.0], shell_points=[1, 7, 13], DH=384, groups=8, deformable=False, initializer=None, regularizer=None, constraint=None, bn_momentum=0.9, bn_phi=None, bn_htilde=None, bn_hhat=None, activation=<LeakyReLU name=leaky_re_lu, built=True>, built_Q=False, built_W=False, built_Phi=False, built_phi=False, built_Psi=False, built_psi=False, built_Atilde=False, built_btilde=False, built_Ahat=False, built_bhat=False, **kwargs)
See
Layerandlayer.Layer.__init__().
- build(dim_in)
Build the \(\pmb{Q} \in \mathbb{R}^{m_q \times n_x}\) matrix representing the kernel’s structure space. Typically, the initialization is delegated to a
FibonacciShellInitializerobject. Also, build the many weights matrices and vectors.See
Layerandlayer.Layer.build().
- call(inputs, training=False, mask=False)
Compute the KPConvXLayer on an input batch.
- Parameters:
inputs –
The input such that:
- – inputs[0]
is the structure space tensor representing the geometry of the many receptive fields in the batch.
\[\mathcal{X} \in \mathbb{R}^{B \times R \times n_x}\]- – inputs[1]
is the feature space tensor representing the features of the many receptive fields in the batch.
\[\mathcal{F} \in \mathbb{R}^{B \times R \times D_{\text{in}}}\]- – inputs[2]
is the indexing tensor representing the neighborhoods of \(\kappa\) neighbors for each input point, in the same space.
\[\mathcal{N} \in \mathbb{Z}^{B \times R \times \kappa}\]
- Returns:
The output feature space \(\mathcal{\widehat{F}} \in \mathbb{R}^{B \times R \times D_{\mathrm{in}}}\).
- get_config()
Return necessary data to serialize the layer
- classmethod from_config(config)
Use given config data to deserialize the layer
- export_representation(dir_path, out_prefix=None, Wpast=None)
Export a set of files representing the state of the kernel for both the structure (Q) and the weights (W).
- Parameters:
dir_path (str) – The directory where the representation files will be exported.
out_prefix (str) – The output prefix to name the output files.
Wpast (
np.ndarrayortf.Tensoror None) – The weights of the kernel in the past.
- Returns:
Nothing at all, but the representation is exported as a set of files inside the given directory.