We present a novel approach to enhance the capabilities of VQVAE models through the integration of an Attentive Residual Encoder (AREN) and a Residual Pixel Attention layer. The objective of our research is to improve the performance of VQVAE while maintaining practical parameter levels. The AREN encoder is designed to operate effectively at multiple levels, accommodating diverse architectural complexities. The key innovation is the integration of an inter-pixel auto-attention mechanism into the AREN encoder. This approach allows us to efficiently capture and utilize contextual information across latent vectors. Additionally, our models uses additional encoding levels to further enhance the model's representational power. Our attention layer employs a minimal parameter approach, ensuring that latent vectors are modified only when pertinent information from other pixels is available. Experimental results demonstrate that our proposed modifications lead to significant improvements in data representation and generation, making VQVAEs even more suitable for a wide range of applications.
The Hadamard Layer, a simple and computationally efficient way to improve results in semantic segmentation tasks, is presented. This layer has no free parameters that require to be trained. Therefore it does not increase the number of model parameters, and the extra computational cost is marginal. Experimental results show that the new Hadamard layer substantially improves the performance of the investigated models (variants of the Pix2Pix model). The performance's improvement can be explained by the Hadamard layer forcing the network to produce an internal encoding of the classes so that all bins are active. Therefore, the network computation is more distributed. In a sort that the Hadamard layer requires that to change the predicted class, it is necessary to modify $2^{k-1}$ bins, assuming $k$ bins in the encoding. A specific loss function allows a stable and fast training convergence.