Abstract:In addition to relevance, diversity is an important yet less studied performance metric of cross-modal image retrieval systems, which is critical to user experience. Existing solutions for diversity-aware image retrieval either explicitly post-process the raw retrieval results from standard retrieval systems or try to learn multi-vector representations of images to represent their diverse semantics. However, neither of them is good enough to balance relevance and diversity. On the one hand, standard retrieval systems are usually biased to common semantics and seldom exploit diversity-aware regularization in training, which makes it difficult to promote diversity by post-processing. On the other hand, multi-vector representation methods are not guaranteed to learn robust multiple projections. As a result, irrelevant images and images of rare or unique semantics may be projected inappropriately, which degrades the relevance and diversity of the results generated by some typical algorithms like top-k. To cope with these problems, this paper presents a new method called CoLT that tries to generate much more representative and robust representations for accurately classifying images. Specifically, CoLT first extracts semantics-aware image features by enhancing the preliminary representations of an existing one-to-one cross-modal system with semantics-aware contrastive learning. Then, a transformer-based token classifier is developed to subsume all the features into their corresponding categories. Finally, a post-processing algorithm is designed to retrieve images from each category to form the final retrieval result. Extensive experiments on two real-world datasets Div400 and Div150Cred show that CoLT can effectively boost diversity, and outperforms the existing methods as a whole (with a higher F1 score).
Abstract:Rotated object detection aims to identify and locate objects in images with arbitrary orientation. In this scenario, the oriented directions of objects vary considerably across different images, while multiple orientations of objects exist within an image. This intrinsic characteristic makes it challenging for standard backbone networks to extract high-quality features of these arbitrarily orientated objects. In this paper, we present Adaptive Rotated Convolution (ARC) module to handle the aforementioned challenges. In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images, and an efficient conditional computation mechanism is introduced to accommodate the large orientation variations of objects within an image. The two designs work seamlessly in rotated object detection problem. Moreover, ARC can conveniently serve as a plug-and-play module in various vision backbones to boost their representation ability to detect oriented objects accurately. Experiments on commonly used benchmarks (DOTA and HRSC2016) demonstrate that equipped with our proposed ARC module in the backbone network, the performance of multiple popular oriented object detectors is significantly improved (e.g. +3.03% mAP on Rotated RetinaNet and +4.16% on CFA). Combined with the highly competitive method Oriented R-CNN, the proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
Abstract:This letter considers the secure communication in a reconfigurable intelligent surface (RIS) aided full duplex (FD) system. A FD base station (BS) serves an uplink (UL) user and a downlink (DL) user simultaneously over the same timefrequency dimension assisted by a RIS in the presence of an eavesdropper. In addition, the BS transmits artificial noise (AN) to interfere the eavesdropper's channel. We aim to maximize the weighted sum secrecy rate of UL and DL users by jointly optimizing the transmit beamforming, receive beamforming and AN covariance matrix at the BS, and passive beamforming at the RIS. To handle the non-convex problem, we decompose it into tractable subproblems and propose an efficient algorithm based on alternating optimization framework. Specifically, the receive beamforming is derived as a closed-form solution while other variables are obtained by using semidefinite relaxation (SDR) method and successive convex approximation (SCA) algorithm. Simulation results demonstrate the superior performance of our proposed scheme compared to other baseline schemes.
Abstract:Deep learning techniques have shown promising results in image compression, with competitive bitrate and image reconstruction quality from compressed latent. However, while image compression has progressed towards higher peak signal-to-noise ratio (PSNR) and fewer bits per pixel (bpp), their robustness to corner-case images has never received deliberation. In this work, we, for the first time, investigate the robustness of image compression systems where imperceptible perturbation of input images can precipitate a significant increase in the bitrate of their compressed latent. To characterize the robustness of state-of-the-art learned image compression, we mount white and black-box attacks. Our results on several image compression models with various bitrate qualities show that they are surprisingly fragile, where the white-box attack achieves up to 56.326x and black-box 1.947x bpp change. To improve robustness, we propose a novel model which incorporates attention modules and a basic factorized entropy model, resulting in a promising trade-off between the PSNR/bpp ratio and robustness to adversarial attacks that surpasses existing learned image compressors.
Abstract:In this letter, we study the reconfigurable intelligent surfaces (RIS) aided full-duplex (FD) communication system. By jointly designing the active beamforming of two multi-antenna sources and passive beamforming of RIS, we aim to maximize the energy efficiency of the system, where extra self-interference cancellation power consumption in FD system is also considered. We divide the optimization problem into active and passive beamforming design subproblems, and adopt the alternative optimization framework to solve them iteratively. Dinkelbach's method is used to tackle the fractional objective function in active beamforming problem. Penalty method and successive convex approximation are exploited for passive beamforming design. Simulation results show the energy efficiency of our scheme outperforms other benchmarks.
Abstract:Cross domain object detection is a realistic and challenging task in the wild. It suffers from performance degradation due to large shift of data distributions and lack of instance-level annotations in the target domain. Existing approaches mainly focus on either of these two difficulties, even though they are closely coupled in cross domain object detection. To solve this problem, we propose a novel Target-perceived Dual-branch Distillation (TDD) framework. By integrating detection branches of both source and target domains in a unified teacher-student learning scheme, it can reduce domain shift and generate reliable supervision effectively. In particular, we first introduce a distinct Target Proposal Perceiver between two domains. It can adaptively enhance source detector to perceive objects in a target image, by leveraging target proposal contexts from iterative cross-attention. Afterwards, we design a concise Dual Branch Self Distillation strategy for model training, which can progressively integrate complementary object knowledge from different domains via self-distillation in two branches. Finally, we conduct extensive experiments on a number of widely-used scenarios in cross domain object detection. The results show that our TDD significantly outperforms the state-of-the-art methods on all the benchmarks. Our code and model will be available at https://github.com/Feobi1999/TDD.
Abstract:Domain adaptive object detection (DAOD) is a promising way to alleviate performance drop of detectors in new scenes. Albeit great effort made in single source domain adaptation, a more generalized task with multiple source domains remains not being well explored, due to knowledge degradation during their combination. To address this issue, we propose a novel approach, namely target-relevant knowledge preservation (TRKP), to unsupervised multi-source DAOD. Specifically, TRKP adopts the teacher-student framework, where the multi-head teacher network is built to extract knowledge from labeled source domains and guide the student network to learn detectors in unlabeled target domain. The teacher network is further equipped with an adversarial multi-source disentanglement (AMSD) module to preserve source domain-specific knowledge and simultaneously perform cross-domain alignment. Besides, a holistic target-relevant mining (HTRM) scheme is developed to re-weight the source images according to the source-target relevance. By this means, the teacher network is enforced to capture target-relevant knowledge, thus benefiting decreasing domain shift when mentoring object detection in the target domain. Extensive experiments are conducted on various widely used benchmarks with new state-of-the-art scores reported, highlighting the effectiveness.
Abstract:This work studies the effectiveness of a novel simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) aided Full-Duplex (FD) communication system. We aim to maximize the energy efficiency by jointly optimizing the transmit power and passive beamforming at the STAR-RIS. We propose an efficient algorithm to optimize them iteratively under the alternating optimization framework. The successive convex approximation (SCA) and Dinkelbach's method are used to solve the power optimization subproblem. The penalty-based method is used to design passive beamforming at the STAR-RIS. Numerical results verify the convergence and effectiveness of the proposed algorithm, and further reveal the benifits of the combining of the STAR-RIS and FD communication compared to benchmarks.
Abstract:This work demonstrates the effectiveness of a novel simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) in Full-Duplex (FD) aided communication system. The objective is to minimize the total transmit power by jointly designing the transmit power and the transmitting and reflecting (T&R) coefficients of the STAR-RIS. To solve the nonconvex problem, an efficient algorithm is proposed by utilizing the alternating optimization framework to iteratively optimize variables. Specifically, in each iteration, we drive the closed-form expression for the optimal power design. The successive convex approximation (SCA) method and semidefinite program (SDP) are used to solve the passive beamforming optimization problem. Numerical results verify the convergence and effectiveness of the proposed algorithm, and further reveal in which scenarios STAR-RIS assisted FD communication defeats the Half-Duplex and conventional RIS.
Abstract:Beamforming technology is widely used in millimeter wave systems to combat path losses, and beamformers are usually selected from a predefined codebook. Unfortunately, traditional codebook design neglects the beam squint effect, and this will cause severe performance degradation when the bandwidth is large. In this letter, we consider that a codebook with fixed size is adopted in the wideband beamforming system. First, based on the rectangular beams with conventional beam coverage, we analyze how beam squint affects system performance and derive the expression of average spectrum efficiency. Next, we formulate optimization problem to design the optimal codebook. Simulation results demonstrate that the proposed codebook spreads beam coverage to cope with beam squint and significantly slows down the performance degradation.