Due to its ability of breaking the double-fading effect faced by passive intelligent reflecting surface (IRS), active IRS is evolving a potential technique for future 6G wireless network. To fully exploit the amplifying gain achieved by active IRS, two high-rate methods, maximum ratio reflecting (MRR) and selective ratio reflecting (SRR) are presented, which are motivated by maximum ratio combining and selective ratio combining. Moreover, both MRR and SRR are in closed-form. To further improve the rate, a maximum reflected-signal-to-noise ratio (Max-RSNR) is first proposed with an alternately iterative infrastructure between adjusting the norm of beamforming vector and its normalized vector. This may make a substantial rate enhancement over existing equal-gain reflecting (EGR). Simulation results show the proposed three methods perform much better than existing method EGR in terms of rate. They are in decreasing order of rate performance: Max-RSNR, MRR, SRR, and EGR.
Oriented object detection emerges in many applications from aerial images to autonomous driving, while many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box, leading to a gap between the readily available training corpus and the rising demand for oriented object detection. This paper proposes a simple yet effective oriented object detection approach called H2RBox merely using horizontal box annotation for weakly-supervised training, which closes the above gap and shows competitive performance even against those trained with rotated boxes. The cores of our method are weakly- and self-supervised learning, which predicts the angle of the object by learning the consistency of two different views. To our best knowledge, H2RBox is the first horizontal box annotation-based oriented object detector. Compared to an alternative i.e. horizontal box-supervised instance segmentation with our post adaption to oriented object detection, our approach is not susceptible to the prediction quality of mask and can perform more robustly in complex scenes containing a large number of dense objects and outliers. Experimental results show that H2RBox has significant performance and speed advantages over horizontal box-supervised instance segmentation methods, as well as lower memory requirements. While compared to rotated box-supervised oriented object detectors, our method shows very close performance and speed, and even surpasses them in some cases. The source code is available at https://github.com/yangxue0827/h2rbox-mmrotate.
Night imaging with modern smartphone cameras is troublesome due to low photon count and unavoidable noise in the imaging system. Directly adjusting exposure time and ISO ratings cannot obtain sharp and noise-free images at the same time in low-light conditions. Though many methods have been proposed to enhance noisy or blurry night images, their performances on real-world night photos are still unsatisfactory due to two main reasons: 1) Limited information in a single image and 2) Domain gap between synthetic training images and real-world photos (e.g., differences in blur area and resolution). To exploit the information from successive long- and short-exposure images, we propose a learning-based pipeline to fuse them. A D2HNet framework is developed to recover a high-quality image by deblurring and enhancing a long-exposure image under the guidance of a short-exposure image. To shrink the domain gap, we leverage a two-phase DeblurNet-EnhanceNet architecture, which performs accurate blur removal on a fixed low resolution so that it is able to handle large ranges of blur in different resolution inputs. In addition, we synthesize a D2-Dataset from HD videos and experiment on it. The results on the validation set and real photos demonstrate our methods achieve better visual quality and state-of-the-art quantitative scores. The D2HNet codes and D2-Dataset can be found at https://github.com/zhaoyuzhi/D2HNet.
The rapid development of deep learning has made a great progress in segmentation, one of the fundamental tasks of computer vision. However, the current segmentation algorithms mostly rely on the availability of pixel-level annotations, which are often expensive, tedious, and laborious. To alleviate this burden, the past years have witnessed an increasing attention in building label-efficient, deep-learning-based segmentation algorithms. This paper offers a comprehensive review on label-efficient segmentation methods. To this end, we first develop a taxonomy to organize these methods according to the supervision provided by different types of weak labels (including no supervision, coarse supervision, incomplete supervision and noisy supervision) and supplemented by the types of segmentation problems (including semantic segmentation, instance segmentation and panoptic segmentation). Next, we summarize the existing label-efficient segmentation methods from a unified perspective that discusses an important question: how to bridge the gap between weak supervision and dense prediction -- the current methods are mostly based on heuristic priors, such as cross-pixel similarity, cross-label constraint, cross-view consistency, cross-image relation, etc. Finally, we share our opinions about the future research directions for label-efficient deep segmentation.
For a passive direction of arrival (DOA) measurement system using massive multiple input multiple output (MIMO), the complexity of the covariance matrix decompositionbased DOA measurement method is extremely high. To significantly reduce the computational complexity, two strategies are proposed. Firstly, a rapid power-iterative estimation of signal parameters via rotational invariance technique (RPI-ESPRIT) method is proposed, which not only reduces the complexity but also achieves good directional measurement results. However, the general complexity is still high. In order to further the complexity, a rapid power-iterative root Multiple Signal Classification (RPIRoot-MUSIC) method is proposed. Simulation results show that the two proposed methods outperform the classical DOA estimation method in terms of computational complexity. In particular, the lowest complexity achieved by the RPI-Root-MUSIC method is about two-order-magnitude lower than that of Root-MUSIC in terms of FLOP. In addition, it is verified that the initial vector and relative error have a substantial effect on the performance of computational complexity.
The appearances of children are inherited from their parents, which makes it feasible to predict them. Predicting realistic children's faces may help settle many social problems, such as age-invariant face recognition, kinship verification, and missing child identification. It can be regarded as an image-to-image translation task. Existing approaches usually assume domain information in the image-to-image translation can be interpreted by "style", i.e., the separation of image content and style. However, such separation is improper for the child face prediction, because the facial contours between children and parents are not the same. To address this issue, we propose a new disentangled learning strategy for children's face prediction. We assume that children's faces are determined by genetic factors (compact family features, e.g., face contour), external factors (facial attributes irrelevant to prediction, such as moustaches and glasses), and variety factors (individual properties for each child). On this basis, we formulate predictions as a mapping from parents' genetic factors to children's genetic factors, and disentangle them from external and variety factors. In order to obtain accurate genetic factors and perform the mapping, we propose a ChildPredictor framework. It transfers human faces to genetic factors by encoders and back by generators. Then, it learns the relationship between the genetic factors of parents and children through a mapping function. To ensure the generated faces are realistic, we collect a large Family Face Database to train ChildPredictor and evaluate it on the FF-Database validation set. Experimental results demonstrate that ChildPredictor is superior to other well-known image-to-image translation methods in predicting realistic and diverse child faces. Implementation codes can be found at https://github.com/zhaoyuzhi/ChildPredictor.
In this paper, an intelligent reflecting surface (IRS)-aided two-way decode-and-forward (DF) relay wireless network is considered, where two users exchange information via IRS and DF relay. To enhance the sum rate performance, three power allocation (PA) strategies are proposed. Firstly, a method of maximizing sum rate (Max-SR) is proposed to jointly optimize the PA factors of user U1, user U2 and relay station (RS). To further improve the sum rate performance, two high-performance schemes, namely maximizing minimum sum rate (Max-Min-SR) and maximizing sum rate with rate constraint (Max-SR-RC), are presented. Simulation results show that the proposed three methods outperform the equal power allocation (EPA) method in terms of sum rate performance. In particular, the highest performance gain achieved by Max-SR-RC method is up to 45.2% over EPA. Furthermore, it is verified that the total power and random shadow variable X{\sigma} have a substantial impact on the sum rate performance.
Partially-supervised instance segmentation is a task which requests segmenting objects from novel unseen categories via learning on limited seen categories with annotated masks thus eliminating demands of heavy annotation burden. The key to addressing this task is to build an effective class-agnostic mask segmentation model. Unlike previous methods that learn such models only on seen categories, in this paper, we propose a new method, named ContrastMask, which learns a mask segmentation model on both seen and unseen categories under a unified pixel-level contrastive learning framework. In this framework, annotated masks of seen categories and pseudo masks of unseen categories serve as a prior for contrastive learning, where features from the mask regions (foreground) are pulled together, and are contrasted against those from the background, and vice versa. Through this framework, feature discrimination between foreground and background is largely improved, facilitating learning of the class-agnostic mask segmentation model. Exhaustive experiments on the COCO dataset demonstrate the superiority of our method, which outperforms previous state-of-the-arts.
In this paper, we investigate the problem of pilot optimization and channel estimation of two-way relaying network (TWRN) aided by an intelligent reflecting surface (IRS) with finite discrete phase shifters. In a TWRN, there exists a challenging problem that the two cascading channels from source-to-IRS-to-Relay and destination-to-IRS-to-relay interfere with each other. Via designing the initial phase shifts of IRS and pilot pattern, the two cascading channels are separated by using simple arithmetic operations like addition and subtraction. Then, the least-squares estimator is adopted to estimate the two cascading channels and two direct channels from source to relay and destination to relay. The corresponding mean square errors (MSE) of channel estimators are derived. By minimizing MSE, the optimal phase shift matrix of IRS is proved. Then, two special matrices Hadamard and discrete Fourier transform (DFT) matrix is shown to be two optimal training matrices for IRS. Furthermore, the IRS with discrete finite phase shifters is taken into account. Using theoretical derivation and numerical simulations, we find that 3-4 bits phase shifters are sufficient for IRS to achieve a negligible MSE performance loss. More importantly, the Hadamard matrix requires only one-bit phase shifters to achieve the optimal MSE performance while the DFT matrix requires at least three or four bits to achieve the same performance. Thus, the Hadamard matrix is a perfect choice for channel estimation using low-resolution phase-shifting IRS.