Cross domain object detection is a realistic and challenging task in the wild. It suffers from performance degradation due to large shift of data distributions and lack of instance-level annotations in the target domain. Existing approaches mainly focus on either of these two difficulties, even though they are closely coupled in cross domain object detection. To solve this problem, we propose a novel Target-perceived Dual-branch Distillation (TDD) framework. By integrating detection branches of both source and target domains in a unified teacher-student learning scheme, it can reduce domain shift and generate reliable supervision effectively. In particular, we first introduce a distinct Target Proposal Perceiver between two domains. It can adaptively enhance source detector to perceive objects in a target image, by leveraging target proposal contexts from iterative cross-attention. Afterwards, we design a concise Dual Branch Self Distillation strategy for model training, which can progressively integrate complementary object knowledge from different domains via self-distillation in two branches. Finally, we conduct extensive experiments on a number of widely-used scenarios in cross domain object detection. The results show that our TDD significantly outperforms the state-of-the-art methods on all the benchmarks. Our code and model will be available at https://github.com/Feobi1999/TDD.
Domain adaptive object detection (DAOD) is a promising way to alleviate performance drop of detectors in new scenes. Albeit great effort made in single source domain adaptation, a more generalized task with multiple source domains remains not being well explored, due to knowledge degradation during their combination. To address this issue, we propose a novel approach, namely target-relevant knowledge preservation (TRKP), to unsupervised multi-source DAOD. Specifically, TRKP adopts the teacher-student framework, where the multi-head teacher network is built to extract knowledge from labeled source domains and guide the student network to learn detectors in unlabeled target domain. The teacher network is further equipped with an adversarial multi-source disentanglement (AMSD) module to preserve source domain-specific knowledge and simultaneously perform cross-domain alignment. Besides, a holistic target-relevant mining (HTRM) scheme is developed to re-weight the source images according to the source-target relevance. By this means, the teacher network is enforced to capture target-relevant knowledge, thus benefiting decreasing domain shift when mentoring object detection in the target domain. Extensive experiments are conducted on various widely used benchmarks with new state-of-the-art scores reported, highlighting the effectiveness.
This work studies the effectiveness of a novel simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) aided Full-Duplex (FD) communication system. We aim to maximize the energy efficiency by jointly optimizing the transmit power and passive beamforming at the STAR-RIS. We propose an efficient algorithm to optimize them iteratively under the alternating optimization framework. The successive convex approximation (SCA) and Dinkelbach's method are used to solve the power optimization subproblem. The penalty-based method is used to design passive beamforming at the STAR-RIS. Numerical results verify the convergence and effectiveness of the proposed algorithm, and further reveal the benifits of the combining of the STAR-RIS and FD communication compared to benchmarks.
This work demonstrates the effectiveness of a novel simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) in Full-Duplex (FD) aided communication system. The objective is to minimize the total transmit power by jointly designing the transmit power and the transmitting and reflecting (T&R) coefficients of the STAR-RIS. To solve the nonconvex problem, an efficient algorithm is proposed by utilizing the alternating optimization framework to iteratively optimize variables. Specifically, in each iteration, we drive the closed-form expression for the optimal power design. The successive convex approximation (SCA) method and semidefinite program (SDP) are used to solve the passive beamforming optimization problem. Numerical results verify the convergence and effectiveness of the proposed algorithm, and further reveal in which scenarios STAR-RIS assisted FD communication defeats the Half-Duplex and conventional RIS.
Beamforming technology is widely used in millimeter wave systems to combat path losses, and beamformers are usually selected from a predefined codebook. Unfortunately, traditional codebook design neglects the beam squint effect, and this will cause severe performance degradation when the bandwidth is large. In this letter, we consider that a codebook with fixed size is adopted in the wideband beamforming system. First, based on the rectangular beams with conventional beam coverage, we analyze how beam squint affects system performance and derive the expression of average spectrum efficiency. Next, we formulate optimization problem to design the optimal codebook. Simulation results demonstrate that the proposed codebook spreads beam coverage to cope with beam squint and significantly slows down the performance degradation.
The automatic quality assessment of self-media online articles is an urgent and new issue, which is of great value to the online recommendation and search. Different from traditional and well-formed articles, self-media online articles are mainly created by users, which have the appearance characteristics of different text levels and multi-modal hybrid editing, along with the potential characteristics of diverse content, different styles, large semantic spans and good interactive experience requirements. To solve these challenges, we establish a joint model CoQAN in combination with the layout organization, writing characteristics and text semantics, designing different representation learning subnetworks, especially for the feature learning process and interactive reading habits on mobile terminals. It is more consistent with the cognitive style of expressing an expert's evaluation of articles. We have also constructed a large scale real-world assessment dataset. Extensive experimental results show that the proposed framework significantly outperforms state-of-the-art methods, and effectively learns and integrates different factors of the online article quality assessment.
Attribute recognition is a crucial but challenging task due to viewpoint changes, illumination variations and appearance diversities, etc. Most of previous work only consider the attribute-level feature embedding, which might perform poorly in complicated heterogeneous conditions. To address this problem, we propose a hierarchical feature embedding (HFE) framework, which learns a fine-grained feature embedding by combining attribute and ID information. In HFE, we maintain the inter-class and intra-class feature embedding simultaneously. Not only samples with the same attribute but also samples with the same ID are gathered more closely, which could restrict the feature embedding of visually hard samples with regard to attributes and improve the robustness to variant conditions. We establish this hierarchical structure by utilizing HFE loss consisted of attribute-level and ID-level constraints. We also introduce an absolute boundary regularization and a dynamic loss weight as supplementary components to help build up the feature embedding. Experiments show that our method achieves the state-of-the-art results on two pedestrian attribute datasets and a facial attribute dataset.
The sequence-to-sequence (Seq2Seq) model generates target words iteratively given the previously observed words during decoding process, which results in the loss of the holistic semantics in the target response and the complete semantic relationship between responses and dialogue histories. In this paper, we propose a generic diversity-promoting joint network, called Holistic Semantic Constraint Joint Network (HSCJN), enhancing the global sentence information, and then regularizing the objective function with penalizing the low entropy output. Our network introduces more target information to improve diversity, and captures direct semantic information to better constrain the relevance simultaneously. Moreover, the proposed method can be easily applied to any Seq2Seq structure. Extensive experiments on several dialogue corpuses show that our method effectively improves both semantic consistency and diversity of generated responses, and achieves better performance than other competitive methods.