Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qijun Zhao

GenUDC: High Quality 3D Mesh Generation with Unsigned Dual Contouring Representation

Oct 23, 2024

Ruowei Wang, Jiaqi Li, Dan Zeng, Xueqi Ma, Zixiang Xu, Jianwei Zhang, Qijun Zhao

Abstract:Generating high-quality meshes with complex structures and realistic surfaces is the primary goal of 3D generative models. Existing methods typically employ sequence data or deformable tetrahedral grids for mesh generation. However, sequence-based methods have difficulty producing complex structures with many faces due to memory limits. The deformable tetrahedral grid-based method MeshDiffusion fails to recover realistic surfaces due to the inherent ambiguity in deformable grids. We propose the GenUDC framework to address these challenges by leveraging the Unsigned Dual Contouring (UDC) as the mesh representation. UDC discretizes a mesh in a regular grid and divides it into the face and vertex parts, recovering both complex structures and fine details. As a result, the one-to-one mapping between UDC and mesh resolves the ambiguity problem. In addition, GenUDC adopts a two-stage, coarse-to-fine generative process for 3D mesh generation. It first generates the face part as a rough shape and then the vertex part to craft a detailed shape. Extensive evaluations demonstrate the superiority of UDC as a mesh representation and the favorable performance of GenUDC in mesh generation. The code and trained models are available at https://github.com/TrepangCat/GenUDC.

* ACMMM 2024, code:https://github.com/TrepangCat/GenUDC

Via

Access Paper or Ask Questions

Low-Light Object Tracking: A Benchmark

Aug 21, 2024

Pengzhi Zhong, Xiaoyu Guo, Defeng Huang, Xiaojun Peng, Yian Li, Qijun Zhao, Shuiwang Li

Abstract:In recent years, the field of visual tracking has made significant progress with the application of large-scale training datasets. These datasets have supported the development of sophisticated algorithms, enhancing the accuracy and stability of visual object tracking. However, most research has primarily focused on favorable illumination circumstances, neglecting the challenges of tracking in low-ligh environments. In low-light scenes, lighting may change dramatically, targets may lack distinct texture features, and in some scenarios, targets may not be directly observable. These factors can lead to a severe decline in tracking performance. To address this issue, we introduce LLOT, a benchmark specifically designed for Low-Light Object Tracking. LLOT comprises 269 challenging sequences with a total of over 132K frames, each carefully annotated with bounding boxes. This specially designed dataset aims to promote innovation and advancement in object tracking techniques for low-light conditions, addressing challenges not adequately covered by existing benchmarks. To assess the performance of existing methods on LLOT, we conducted extensive tests on 39 state-of-the-art tracking algorithms. The results highlight a considerable gap in low-light tracking performance. In response, we propose H-DCPT, a novel tracker that incorporates historical and darkness clue prompts to set a stronger baseline. H-DCPT outperformed all 39 evaluated methods in our experiments, demonstrating significant improvements. We hope that our benchmark and H-DCPT will stimulate the development of novel and accurate methods for tracking objects in low-light conditions. The LLOT and code are available at https://github.com/OpenCodeGithub/H-DCPT.

Via

Access Paper or Ask Questions

Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

Jul 07, 2024

You Wu, Xucheng Wang, Dan Zeng, Hengzhou Ye, Xiaolan Xie, Qijun Zhao, Shuiwang Li

Figure 1 for Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

Figure 2 for Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

Figure 3 for Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

Figure 4 for Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking

Abstract:Recently, the surge in the adoption of single-stream architectures utilizing pre-trained ViT backbones represents a promising advancement in the field of generic visual tracking. By integrating feature extraction and fusion into a cohesive framework, these architectures offer improved performance, efficiency, and robustness. However, there has been limited exploration into optimizing these frameworks for UAV tracking. In this paper, we boost the efficiency of this framework by tailoring it into an adaptive computation framework that dynamically exits Transformer blocks for real-time UAV tracking. The motivation behind this is that tracking tasks with fewer challenges can be adequately addressed using low-level feature representations. Simpler tasks can often be handled with less demanding, lower-level features. This approach allows the model use computational resources more efficiently by focusing on complex tasks and conserving resources for easier ones. Another significant enhancement introduced in this paper is the improved effectiveness of ViTs in handling motion blur, a common issue in UAV tracking caused by the fast movements of either the UAV, the tracked objects, or both. This is achieved by acquiring motion blur robust representations through enforcing invariance in the feature representation of the target with respect to simulated motion blur. The proposed approach is dubbed BDTrack. Extensive experiments conducted on five tracking benchmarks validate the effectiveness and versatility of our approach, establishing it as a cutting-edge solution in real-time UAV tracking. Code is released at: https://github.com/wuyou3474/BDTrack.

Via

Access Paper or Ask Questions

DualFocus: A Unified Framework for Integrating Positive and Negative Descriptors in Text-based Person Retrieval

May 13, 2024

Yuchuan Deng, Zhanpeng Hu, Jiakun Han, Chuang Deng, Qijun Zhao

Abstract:Text-based person retrieval (TPR) aims to retrieve images of a person from an extensive array of candidates based on a given textual description. The core challenge lies in mapping visual and textual data into a unified latent space. While existing TPR methods concentrate on recognizing explicit and positive characteristics, they often neglect the critical influence of negative descriptors, resulting in potential false positives that fulfill positive criteria but could be excluded by negative descriptors. To alleviate these issues, we introduce DualFocus, a unified framework for integrating positive and negative descriptors to enhance the interpretative accuracy of vision-language foundational models regarding textual queries. DualFocus employs Dual (Positive/Negative) Attribute Prompt Learning (DAPL), which integrates Dual Image-Attribute Contrastive (DIAC) Learning and Sensitive Image-Attributes Matching (SIAM) Learning. This way DualFocus enhances the detection of unseen attributes, thereby boosting retrieval precision. To further achieve a balance between coarse and fine-grained alignment of visual and textual embeddings, we propose the Dynamic Tokenwise Similarity (DTS) loss, which refines the representation of both matching and non-matching descriptions, thereby enhancing the matching process through a detailed and adaptable similarity assessment. By focusing on token-level comparisons, DualFocus significantly outperforms existing techniques in both precision and robustness. The experiment results highlight DualFocus's superior performance on CUHK-PEDES, ICFG-PEDES, and RSTPReid.

Via

Access Paper or Ask Questions

Feature-Aware Noise Contrastive Learning For Unsupervised Red Panda Re-Identification

May 01, 2024

Jincheng Zhang, Qijun Zhao, Tie Liu

Figure 1 for Feature-Aware Noise Contrastive Learning For Unsupervised Red Panda Re-Identification

Figure 2 for Feature-Aware Noise Contrastive Learning For Unsupervised Red Panda Re-Identification

Figure 3 for Feature-Aware Noise Contrastive Learning For Unsupervised Red Panda Re-Identification

Figure 4 for Feature-Aware Noise Contrastive Learning For Unsupervised Red Panda Re-Identification

Abstract:To facilitate the re-identification (Re-ID) of individual animals, existing methods primarily focus on maximizing feature similarity within the same individual and enhancing distinctiveness between different individuals. However, most of them still rely on supervised learning and require substantial labeled data, which is challenging to obtain. To avoid this issue, we propose a Feature-Aware Noise Contrastive Learning (FANCL) method to explore an unsupervised learning solution, which is then validated on the task of red panda re-ID. FANCL employs a Feature-Aware Noise Addition module to produce noised images that conceal critical features and designs two contrastive learning modules to calculate the losses. Firstly, a feature consistency module is designed to bridge the gap between the original and noised features. Secondly, the neural networks are trained through a cluster contrastive learning module. Through these more challenging learning tasks, FANCL can adaptively extract deeper representations of red pandas. The experimental results on a set of red panda images collected in both indoor and outdoor environments prove that FANCL outperforms several related state-of-the-art unsupervised methods, achieving high performance comparable to supervised learning methods.

* 7 pages, 5 figures, IJCNN2024

Via

Access Paper or Ask Questions

Hierarchical Generative Network for Face Morphing Attacks

Mar 17, 2024

Zuyuan He, Zongyong Deng, Qiaoyun He, Qijun Zhao

Figure 1 for Hierarchical Generative Network for Face Morphing Attacks

Figure 2 for Hierarchical Generative Network for Face Morphing Attacks

Figure 3 for Hierarchical Generative Network for Face Morphing Attacks

Figure 4 for Hierarchical Generative Network for Face Morphing Attacks

Abstract:Face morphing attacks circumvent face recognition systems (FRSs) by creating a morphed image that contains multiple identities. However, existing face morphing attack methods either sacrifice image quality or compromise the identity preservation capability. Consequently, these attacks fail to bypass FRSs verification well while still managing to deceive human observers. These methods typically rely on global information from contributing images, ignoring the detailed information from effective facial regions. To address the above issues, we propose a novel morphing attack method to improve the quality of morphed images and better preserve the contributing identities. Our proposed method leverages the hierarchical generative network to capture both local detailed and global consistency information. Additionally, a mask-guided image blending module is dedicated to removing artifacts from areas outside the face to improve the image's visual quality. The proposed attack method is compared to state-of-the-art methods on three public datasets in terms of FRSs' vulnerability, attack detectability, and image quality. The results show our method's potential threat of deceiving FRSs while being capable of passing multiple morphing attack detection (MAD) scenarios.

* Accepted by FG2024

Via

Access Paper or Ask Questions

Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection

Mar 04, 2024

Xin Zhang, Tao Xiao, Gepeng Ji, Xuan Wu, Keren Fu, Qijun Zhao

Abstract:Camouflage poses challenges in distinguishing a static target, whereas any movement of the target can break this disguise. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly using a frozen pre-trained optical flow fundamental model. EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation. Interactions across the dual streams are realized in an interactive prompting way that is inspired by emerging visual prompt learning. Two learnable modules, i.e. the camouflaged feeder and motion collector, are designed to incorporate segmentation-to-motion and motion-to-segmentation prompts, respectively, and enhance outputs of the both streams. The prompt fed to the motion stream is learned by supervising optical flow in a self-supervised manner. Furthermore, we show that long-term historical information can also be incorporated as a prompt into EMIP and achieve more robust results with temporal consistency. Experimental results demonstrate that our EMIP achieves new state-of-the-art records on popular VCOD benchmarks. The code will be publicly available.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions

Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

Feb 16, 2024

Xin Zhang, Keren Fu, Qijun Zhao

Figure 1 for Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

Figure 2 for Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

Figure 3 for Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

Figure 4 for Dynamic Patch-aware Enrichment Transformer for Occluded Person Re-Identification

Abstract:Person re-identification (re-ID) continues to pose a significant challenge, particularly in scenarios involving occlusions. Prior approaches aimed at tackling occlusions have predominantly focused on aligning physical body features through the utilization of external semantic cues. However, these methods tend to be intricate and susceptible to noise. To address the aforementioned challenges, we present an innovative end-to-end solution known as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model effectively distinguishes human body information from occlusions automatically and dynamically, eliminating the need for external detectors or precise image alignment. Specifically, we introduce a dynamic patch token selection module (DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify informative occlusion-free tokens. These tokens are then selected for deriving subsequent local part features. To facilitate the seamless integration of global classification features with the finely detailed local features selected by DPSM, we introduce a novel feature blending module (FBM). FBM enhances feature representation through the complementary nature of information and the exploitation of part diversity. Furthermore, to ensure that DPSM and the entire DPEFormer can effectively learn with only identity labels, we also propose a Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the recent advances in the Segment Anything Model (SAM). As a result, it generates occlusion images that closely resemble real-world occlusions, greatly enhancing the subsequent contrastive learning process. Experiments on occluded and holistic re-ID benchmarks signify a substantial advancement of DPEFormer over existing state-of-the-art approaches. The code will be made publicly available.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

Optimal-Landmark-Guided Image Blending for Face Morphing Attacks

Jan 30, 2024

Qiaoyun He, Zongyong Deng, Zuyuan He, Qijun Zhao

Figure 1 for Optimal-Landmark-Guided Image Blending for Face Morphing Attacks

Figure 2 for Optimal-Landmark-Guided Image Blending for Face Morphing Attacks

Figure 3 for Optimal-Landmark-Guided Image Blending for Face Morphing Attacks

Figure 4 for Optimal-Landmark-Guided Image Blending for Face Morphing Attacks

Abstract:In this paper, we propose a novel approach for conducting face morphing attacks, which utilizes optimal-landmark-guided image blending. Current face morphing attacks can be categorized into landmark-based and generation-based approaches. Landmark-based methods use geometric transformations to warp facial regions according to averaged landmarks but often produce morphed images with poor visual quality. Generation-based methods, which employ generation models to blend multiple face images, can achieve better visual quality but are often unsuccessful in generating morphed images that can effectively evade state-of-the-art face recognition systems~(FRSs). Our proposed method overcomes the limitations of previous approaches by optimizing the morphing landmarks and using Graph Convolutional Networks (GCNs) to combine landmark and appearance features. We model facial landmarks as nodes in a bipartite graph that is fully connected and utilize GCNs to simulate their spatial and structural relationships. The aim is to capture variations in facial shape and enable accurate manipulation of facial appearance features during the warping process, resulting in morphed facial images that are highly realistic and visually faithful. Experiments on two public datasets prove that our method inherits the advantages of previous landmark-based and generation-based methods and generates morphed images with higher quality, posing a more significant threat to state-of-the-art FRSs.

* Accepted by IJCB2023

Via

Access Paper or Ask Questions

Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

Dec 30, 2023

Xianjie Liu, Keren Fu, Qijun Zhao

Figure 1 for Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

Figure 2 for Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

Figure 3 for Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

Figure 4 for Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

Abstract:Segmenting any object represents a crucial step towards achieving artificial general intelligence, and the "Segment Anything Model" (SAM) has significantly advanced the development of foundational models in computer vision. We have high expectations regarding whether SAM can enhance highly accurate dichotomous image segmentation. In fact, the evidence presented in this article demonstrates that by inputting SAM with simple prompt boxes and utilizing the results output by SAM as input for IS5Net, we can greatly improve the effectiveness of highly accurate dichotomous image segmentation.

Via

Access Paper or Ask Questions