The significant progress on Generative Adversarial Networks (GANs) has facilitated realistic single-object image generation based on language input. However, complex-scene generation (with various interactions among multiple objects) still suffers from messy layouts and object distortions, due to diverse configurations in layouts and appearances. Prior methods are mostly object-driven and ignore their inter-relations that play a significant role in complex-scene images. This work explores relationship-aware complex-scene image generation, where multiple objects are inter-related as a scene graph. With the help of relationships, we propose three major updates in the generation framework. First, reasonable spatial layouts are inferred by jointly considering the semantics and relationships among objects. Compared to standard location regression, we show relative scales and distances serve a more reliable target. Second, since the relations between objects significantly influence an object's appearance, we design a relation-guided generator to generate objects reflecting their relationships. Third, a novel scene graph discriminator is proposed to guarantee the consistency between the generated image and the input scene graph. Our method tends to synthesize plausible layouts and objects, respecting the interplay of multiple objects in an image. Experimental results on Visual Genome and HICO-DET datasets show that our proposed method significantly outperforms prior arts in terms of IS and FID metrics. Based on our user study and visual inspection, our method is more effective in generating logical layout and appearance for complex-scenes.
In this article, we design a new time-of-arrival (TOA) system for simultaneous user device (UD) localization and synchronization with a periodic asymmetric ranging network, namely PARN. The PARN includes one primary anchor node (PAN) transmitting and receiving signals, and many secondary ANs (SAN) only receiving signals. All the UDs can transmit and receive signals. The PAN periodically transmits sync signal and the UD transmits response signal after reception of the sync signal. Using TOA measurements from the periodic sync signal at SANs, we develop a Kalman filtering method to virtually synchronize ANs with high accuracy estimation of clock parameters. Employing the virtual synchronization, and TOA measurements from the response signal and sync signal, we then develop a maximum likelihood (ML) approach, namely ML-LAS, to simultaneously localize and synchronize a moving UD. We analyze the UD localization and synchronization error, and derive the Cramer-Rao lower bound (CRLB). Different from existing asymmetric ranging network-based TOA systems, the new PARN i) uses the periodic sync signals at the SAN to exploit the temporal correlated clock information for high accuracy virtual synchronization, and ii) compensates the UD movement and clock drift using various TOA measurements to achieve consistent and simultaneous localization and synchronization performance. Numerical results verify the theoretical analysis that the new system has high accuracy in AN clock offset estimation and simultaneous localization and synchronization for a moving UD. We implement a prototype hardware system and demonstrate the feasibility and superiority of the PARN in real-world applications by experiments.
In two-way time-of-arrival (TOA) systems, a user device (UD) obtains its position by round-trip communications to a number of anchor nodes (ANs) at known locations. The objective function of the maximum likelihood (ML) method for two-way TOA localization is nonconvex. Thus, the widely-adopted Gauss-Newton iterative method to solve the ML estimator usually suffers from the local minima problem. In this paper, we convert the original estimator into a convex problem by relaxation, and develop a new semidefinite programming (SDP) based localization method for moving UDs, namely SDP-M. Numerical result demonstrates that compared with the iterative method, which often fall into local minima, the SDP-M always converge to the global optimal solution and significantly reduces the localization error by more than 40%. It also has stable localization accuracy regardless of the UD movement, and outperforms the conventional method for stationary UDs, which has larger error with growing UD velocity.
Rain streaks showing in images or videos would severely degrade the performance of computer vision applications. Thus, it is of vital importance to remove rain streaks and facilitate our vision systems. While recent convolutinal neural network based methods have shown promising results in single image rain removal (SIRR), they fail to effectively capture long-range location dependencies or aggregate convolutional channel information simultaneously. However, as SIRR is a highly illposed problem, these spatial and channel information are very important clues to solve SIRR. First, spatial information could help our model to understand the image context by gathering long-range dependency location information hidden in the image. Second, aggregating channels could help our model to concentrate on channels more related to image background instead of rain streaks. In this paper, we propose a non-local channel aggregation network (NCANet) to address the SIRR problem. NCANet models 2D rainy images as sequences of vectors in three directions, namely vertical direction, transverse direction and channel direction. Recurrently aggregating information from all three directions enables our model to capture the long-range dependencies in both channels and spaitials locations. Extensive experiments on both heavy and light rain image data sets demonstrate the effectiveness of the proposed NCANet model.
Person Re-identification (ReID) has achieved significant improvement due to the adoption of Convolutional Neural Networks (CNNs). However, person ReID systems only provide a distance or similarity when matching two persons, which makes users hardly understand why they are similar or not. Therefore, we propose an Attribute-guided Metric Interpreter, named AttriMeter, to semantically and quantitatively explain the results of CNN-based ReID models. The AttriMeter has a pluggable structure that can be grafted on arbitrary target models, i.e., the ReID models that need to be interpreted. With an attribute decomposition head, it can learn to generate a group of attribute-guided attention maps (AAMs) from the target model. By applying AAMs to features of two persons from the target model, their distance will be decomposed into a set of attribute-guided components that can measure the contributions of individual attributes. Moreover, we design a distance distillation loss to guarantee the consistency between the results from the target model and the decomposed components from AttriMeter, and an attribute prior loss to eliminate the biases caused by the unbalanced distribution of attributes. Finally, extensive experiments and analysis on a variety of ReID models and datasets show the effectiveness of AttriMeter.
In a time division broadcast positioning system (TDBPS), a user device (UD) determines its position by obtaining sequential time-of-arrival (TOA) or pseudorange measurements from signals broadcast by multiple synchronized base stations (BSs). The existing localization method using sequential pseudorange measurements and a linear clock drift model for the TDPBS, namely LSPM-D, does not compensate the position displacement caused by the UD movement and will result in position error. In this paper, depending on the knowledge of the UD velocity, we develop a set of optimal localization methods for different cases. First, for known UD velocity, we develop the optimal localization method, namely LSPM-KVD, to compensate the movement-caused position error. We show that the LSPM-D is a special case of the LSPM-KVD when the UD is stationary with zero velocity. Second, for the case with unknown UD velocity, we develop a maximum likelihood (ML) method to jointly estimate the UD position and velocity, namely LSPM-UVD. Third, in the case that we have prior distribution information of the UD velocity, we present a maximum a posteriori (MAP) estimator for localization, namely LSPM-PVD. We derive the Cramer-Rao lower bound (CRLB) for all three estimators and analyze their localization error performance. We show that the position error of the LSPM-KVD increases as the assumed known velocity deviates from the true value. As expected, the LSPM-KVD has the smallest position error while the LSPM-PVD and the LSPM-UVD are more robust when the prior knowledge of the UD velocity is limited. Numerical results verify the theoretical analysis on the optimality and the positioning accuracy of the proposed methods.
In a multi-agent system (MAS) comprised of parent nodes (PNs) and child nodes (CNs), a relative spatiotemporal coordinate is established by the PNs with known positions. It is an essential technique for the moving CNs to resolve the joint localization and synchronization (JLAS) problem in the MAS. Existing methods using sequential time-of-arrival (TOA) measurements from the PNs' broadcast signals either require a good initial guess or have high computational complexity. In this paper, we propose a new closed-form JLAS approach, namely CFJLAS, which achieves optimal solution without initialization, and has low computational complexity. We first linearize the relation between the estimated parameter and the sequential TOA measurement by squaring and differencing the TOA measurement equations. By devising two intermediate variables, we are able to simplify the problem to finding the solution of a quadratic equation set. Finally, we apply a weighted least squares (WLS) step using the residuals of all the measurements to optimize the estimation. We derive the Cramer-Rao lower bound (CRLB), analyze the estimation error, and show that the estimation accuracy of the CFJLAS reaches CRLB under small noise condition. The complexity of the CFJLAS is studied and compared with the iterative method. Simulations in the 2D scene verify that the estimation accuracy of the new CFJLAS method in position, velocity, clock offset, and clock skew all reaches CRLB. Compared with the conventional iterative method, which requires a good initial guess to converge to the correct estimation and has growing complexity with more iterations, the proposed new CFJLAS method does not require initialization, always obtains the optimal solution and has constant low computational complexity.
Sturge-Weber syndrome (SWS) is a vascular malformation disease, and it may cause blindness if the patient's condition is severe. Clinical results show that SWS can be divided into two types based on the characteristics of scleral blood vessels. Therefore, how to accurately segment scleral blood vessels has become a significant problem in computer-aided diagnosis. In this research, we propose to continuously upsample the bottom layer's feature maps to preserve image details, and design a novel Claw UNet based on UNet for scleral blood vessel segmentation. Specifically, the residual structure is used to increase the number of network layers in the feature extraction stage to learn deeper features. In the decoding stage, by fusing the features of the encoding, upsampling, and decoding parts, Claw UNet can achieve effective segmentation in the fine-grained regions of scleral blood vessels. To effectively extract small blood vessels, we use the attention mechanism to calculate the attention coefficient of each position in images. Claw UNet outperforms other UNet-based networks on scleral blood vessel image dataset.
This paper presents an unobtrusive solution that can automatically identify deep breath when a person is walking past the global depth camera. Existing non-contact breath assessments achieve satisfactory results under restricted conditions when human body stays relatively still. When someone moves forward, the breath signals detected by depth camera are hidden within signals of trunk displacement and deformation, and the signal length is short due to the short stay time, posing great challenges for us to establish models. To overcome these challenges, multiple region of interests (ROIs) based signal extraction and selection method is proposed to automatically obtain the signal informative to breath from depth video. Subsequently, graph signal analysis (GSA) is adopted as a spatial-temporal filter to wipe the components unrelated to breath. Finally, a classifier for identifying deep breath is established based on the selected breath-informative signal. In validation experiments, the proposed approach outperforms the comparative methods with the accuracy, precision, recall and F1 of 75.5%, 76.2%, 75.0% and 75.2%, respectively. This system can be extended to public places to provide timely and ubiquitous help for those who may have or are going through physical or mental trouble.