Deep learning methods are developing rapidly in coded aperture snapshot spectral imaging (CASSI). The number of parameters and FLOPs of existing state-of-the-art methods (SOTA) continues to increase, but the reconstruction accuracy improves slowly. Current methods still face two problems: 1) The performance of the spatial light modulator (SLM) is not fully developed due to the limitation of fixed Mask coding. 2) The single input limits the network performance. In this paper we present a dynamic-mask-based dual camera system, which consists of an RGB camera and a CASSI system running in parallel. First, the system learns the spatial feature distribution of the scene based on the RGB images, then instructs the SLM to encode each scene, and finally sends both RGB and CASSI images to the network for reconstruction. We further designed the DMDC-net, which consists of two separate networks, a small-scale CNN-based dynamic mask network for dynamic adjustment of the mask and a multimodal reconstruction network for reconstruction using RGB and CASSI measurements. Extensive experiments on multiple datasets show that our method achieves more than 9 dB improvement in PSNR over the SOTA. (https://github.com/caizeyu1992/DMDC)
3D pose transfer solves the problem of additional input and correspondence of traditional deformation transfer, only the source and target meshes need to be input, and the pose of the source mesh can be transferred to the target mesh. Some lightweight methods proposed in recent years consume less memory but cause spikes and distortions for some unseen poses, while others are costly in training due to the inclusion of large matrix multiplication and adversarial networks. In addition, the meshes with different numbers of vertices also increase the difficulty of pose transfer. In this work, we propose a Dual-Side Feature Fusion Pose Transfer Network to improve the pose transfer accuracy of the lightweight method. Our method takes the pose features as one of the side inputs to the decoding network and fuses them into the target mesh layer by layer at multiple scales. Our proposed Feature Fusion Adaptive Instance Normalization has the characteristic of having two side input channels that fuse pose features and identity features as denormalization parameters, thus enhancing the pose transfer capability of the network. Extensive experimental results show that our proposed method has stronger pose transfer capability than state-of-the-art methods while maintaining a lightweight network structure, and can converge faster.
Spectral image reconstruction is an important task in snapshot compressed imaging. This paper aims to propose a new end-to-end framework with iterative capabilities similar to a deep unfolding network to improve reconstruction accuracy, independent of optimization conditions, and to reduce the number of parameters. A novel framework called the reversible-prior-based method is proposed. Inspired by the reversibility of the optical path, the reversible-prior-based framework projects the reconstructions back into the measurement space, and then the residuals between the projected data and the real measurements are fed into the network for iteration. The reconstruction subnet in the network then learns the mapping of the residuals to the true values to improve reconstruction accuracy. In addition, a novel spectral-spatial transformer is proposed to account for the global correlation of spectral data in both spatial and spectral dimensions while balancing network depth and computational complexity, in response to the shortcomings of existing transformer-based denoising modules that ignore spatial texture features or learn local spatial features at the expense of global spatial features. Extensive experiments show that our SST-ReversibleNet significantly outperforms state-of-the-art methods on simulated and real HSI datasets, while requiring lower computational and storage costs. https://github.com/caizeyu1992/SST
With the increasing demand for oriented object detection e.g. in autonomous driving and remote sensing, the oriented annotation has become a labor-intensive work. To make full use of existing horizontally annotated datasets and reduce the annotation cost, a weakly-supervised detector H2RBox for learning the rotated box (RBox) from the horizontal box (HBox) has been proposed and received great attention. This paper presents a new version, H2RBox-v2, to further bridge the gap between HBox-supervised and RBox-supervised oriented object detection. While exploiting axisymmetry via flipping and rotating consistencies is available through our theoretical analysis, H2RBox-v2, using a weakly-supervised branch similar to H2RBox, is embedded with a novel self-supervised branch that learns orientations from the symmetry inherent in the image of objects. Complemented by modules to cope with peripheral issues, e.g. angular periodicity, a stable and effective solution is achieved. To our knowledge, H2RBox-v2 is the first symmetry-supervised paradigm for oriented object detection. Compared to H2RBox, our method is less susceptible to low annotation quality and insufficient training data, which in such cases is expected to give a competitive performance much closer to fully-supervised oriented object detectors. Specifically, the performance comparison between H2RBox-v2 and Rotated FCOS on DOTA-v1.0/1.5/2.0 is 72.31%/64.76%/50.33% vs. 72.44%/64.53%/51.77%, 89.66% vs. 88.99% on HRSC, and 42.27% vs. 41.25% on FAIR1M.
With the vigorous development of computer vision, oriented object detection has gradually been featured. In this paper, a novel differentiable angle coder named phase-shifting coder (PSC) is proposed to accurately predict the orientation of objects, along with a dual-frequency version PSCD. By mapping rotational periodicity of different cycles into phase of different frequencies, we provide a unified framework for various periodic fuzzy problems in oriented object detection. Upon such framework, common problems in oriented object detection such as boundary discontinuity and square-like problems are elegantly solved in a unified form. Visual analysis and experiments on three datasets prove the effectiveness and the potentiality of our approach. When facing scenarios requiring high-quality bounding boxes, the proposed methods are expected to give a competitive performance. The codes are publicly available at https://github.com/open-mmlab/mmrotate.
3D face recognition has shown its potential in many application scenarios. Among numerous 3D face recognition methods, deep-learning-based methods have developed vigorously in recent years. In this paper, an end-to-end deep learning network entitled Sur3dNet-Face for point-cloud-based 3D face recognition is proposed. The network uses PointNet as the backbone, which is a successful point cloud classification solution but does not work properly in face recognition. Supplemented with modifications in network architecture and a few-data guided learning framework based on Gaussian process morphable model, the backbone is successfully modified for 3D face recognition. Different from existing methods training with a large amount of data in multiple datasets, our method uses Spring2003 subset of FRGC v2.0 for training which contains only 943 facial scans, and the network is well trained with the guidance of such a small amount of real data. Without fine-tuning on the test set, the Rank-1 Recognition Rate (RR1) is achieved as follows: 98.85% on FRGC v2.0 dataset and 99.33% on Bosphorus dataset, which proves the effectiveness and the potentiality of our method.