Alert button
Picture for Yi Jin

Yi Jin

Alert button

FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing

Jul 29, 2023
Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin

Figure 1 for FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing
Figure 2 for FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing
Figure 3 for FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing
Figure 4 for FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing

To serve the intricate and varied demands of image editing, precise and flexible manipulation of image content is indispensable. Recently, DragGAN has achieved impressive editing results through point-based manipulation. However, we have observed that DragGAN struggles with miss tracking, where DragGAN encounters difficulty in effectively tracking the desired handle points, and ambiguous tracking, where the tracked points are situated within other regions that bear resemblance to the handle points. To deal with the above issues, we propose FreeDrag, which adopts a feature-oriented approach to free the burden on point tracking within the point-oriented methodology of DragGAN. The FreeDrag incorporates adaptive template features, line search, and fuzzy localization techniques to perform stable and efficient point-based image editing. Extensive experiments demonstrate that our method is superior to the DragGAN and enables stable point-based editing in challenging scenarios with similar structures, fine details, or under multi-point targets.

* 8 pages, 7 figures 
Viaarxiv icon

Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement

Jul 19, 2023
Zhixiang Wei, Lin Chen, Tao Tu, Huaian Chen, Pengyang Ling, Yi Jin

Figure 1 for Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement
Figure 2 for Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement
Figure 3 for Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement
Figure 4 for Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement

Most prior semantic segmentation methods have been developed for day-time scenes, while typically underperforming in night-time scenes due to insufficient and complicated lighting conditions. In this work, we tackle this challenge by proposing a novel night-time semantic segmentation paradigm, i.e., disentangle then parse (DTP). DTP explicitly disentangles night-time images into light-invariant reflectance and light-specific illumination components and then recognizes semantics based on their adaptive fusion. Concretely, the proposed DTP comprises two key components: 1) Instead of processing lighting-entangled features as in prior works, our Semantic-Oriented Disentanglement (SOD) framework enables the extraction of reflectance component without being impeded by lighting, allowing the network to consistently recognize the semantics under cover of varying and complicated lighting conditions. 2) Based on the observation that the illumination component can serve as a cue for some semantically confused regions, we further introduce an Illumination-Aware Parser (IAParser) to explicitly learn the correlation between semantics and lighting, and aggregate the illumination features to yield more precise predictions. Extensive experiments on the night-time segmentation task with various settings demonstrate that DTP significantly outperforms state-of-the-art methods. Furthermore, with negligible additional parameters, DTP can be directly used to benefit existing day-time methods for night-time segmentation.

* Accepted by ICCV2023 
Viaarxiv icon

Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification

Jul 17, 2023
Tengfei Liang, Yi Jin, Wu Liu, Tao Wang, Songhe Feng, Yidong Li

Figure 1 for Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification
Figure 2 for Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification
Figure 3 for Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification
Figure 4 for Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification

Visible-Infrared person Re-IDentification (VI-ReID) is a challenging cross-modality image retrieval task that aims to match pedestrians' images across visible and infrared cameras. To solve the modality gap, existing mainstream methods adopt a learning paradigm converting the image retrieval task into an image classification task with cross-entropy loss and auxiliary metric learning losses. These losses follow the strategy of adjusting the distribution of extracted embeddings to reduce the intra-class distance and increase the inter-class distance. However, such objectives do not precisely correspond to the final test setting of the retrieval task, resulting in a new gap at the optimization level. By rethinking these keys of VI-ReID, we propose a simple and effective method, the Multi-level Cross-modality Joint Alignment (MCJA), bridging both modality and objective-level gap. For the former, we design the Modality Alignment Augmentation, which consists of three novel strategies, the weighted grayscale, cross-channel cutmix, and spectrum jitter augmentation, effectively reducing modality discrepancy in the image space. For the latter, we introduce a new Cross-Modality Retrieval loss. It is the first work to constrain from the perspective of the ranking list, aligning with the goal of the testing stage. Moreover, based on the global feature only, our method exhibits good performance and can serve as a strong baseline method for the VI-ReID community.

* 10 pages, 4 figures, 5 tables 
Viaarxiv icon

FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing

Jul 10, 2023
Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin

Figure 1 for FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing
Figure 2 for FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing
Figure 3 for FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing
Figure 4 for FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing

To serve the intricate and varied demands of image editing, precise and flexible manipulation of image content is indispensable. Recently, DragGAN has achieved impressive editing results through point-based manipulation. However, we have observed that DragGAN struggles with miss tracking, where DragGAN encounters difficulty in effectively tracking the desired handle points, and ambiguous tracking, where the tracked points are situated within other regions that bear resemblance to the handle points. To deal with the above issues, we propose FreeDrag, which adopts a feature-oriented approach to free the burden on point tracking within the point-oriented methodology of DragGAN. The FreeDrag incorporates adaptive template features, line search, and fuzzy localization techniques to perform stable and efficient point-based image editing. Extensive experiments demonstrate that our method is superior to the DragGAN and enables stable point-based editing in challenging scenarios with similar structures, fine details, or under multi-point targets.

* 8 pages, 7 figures 
Viaarxiv icon

NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

Apr 18, 2023
Yuanwei Fang, Zihao Liu, Yanheng Lu, Jiawei Liu, Jiajie Li, Yi Jin, Jian Chen, Yenkuang Chen, Hongzhong Zheng, Yuan Xie

Figure 1 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Figure 2 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Figure 3 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Figure 4 for NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.

Viaarxiv icon

The Cascaded Forward Algorithm for Neural Network Training

Mar 24, 2023
Gongpei Zhao, Tao Wang, Yidong Li, Yi Jin, Congyan Lang, Haibin Ling

Figure 1 for The Cascaded Forward Algorithm for Neural Network Training
Figure 2 for The Cascaded Forward Algorithm for Neural Network Training
Figure 3 for The Cascaded Forward Algorithm for Neural Network Training
Figure 4 for The Cascaded Forward Algorithm for Neural Network Training

Backpropagation algorithm has been widely used as a mainstream learning procedure for neural networks in the past decade, and has played a significant role in the development of deep learning. However, there exist some limitations associated with this algorithm, such as getting stuck in local minima and experiencing vanishing/exploding gradients, which have led to questions about its biological plausibility. To address these limitations, alternative algorithms to backpropagation have been preliminarily explored, with the Forward-Forward (FF) algorithm being one of the most well-known. In this paper we propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples and thus leads to a more efficient process at both training and testing. Moreover, in our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems. The proposed method is evaluated on four public image classification benchmarks, and the experimental results illustrate significant improvement in prediction accuracy in comparison with the baseline.

Viaarxiv icon

Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges

Jan 16, 2023
Yushan Han, Hui Zhang, Huifang Li, Yi Jin, Congyan Lang, Yidong Li

Figure 1 for Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges
Figure 2 for Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges
Figure 3 for Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges
Figure 4 for Collaborative Perception in Autonomous Driving: Methods, Datasets and Challenges

Collaborative perception is essential to address occlusion and sensor failure issues in autonomous driving. In recent years, deep learning on collaborative perception has become even thriving, with numerous methods have been proposed. Although some works have reviewed and analyzed the basic architecture and key components in this field, there is still a lack of reviews on systematical collaboration modules in perception networks and large-scale collaborative perception datasets. The primary goal of this work is to address the abovementioned issues and provide a comprehensive review of recent achievements in this field. First, we introduce fundamental technologies and collaboration schemes. Following that, we provide an overview of practical collaborative perception methods and systematically summarize the collaboration modules in networks to improve collaboration efficiency and performance while also ensuring collaboration robustness and safety. Then, we present large-scale public datasets and summarize quantitative results on these benchmarks. Finally, we discuss the remaining challenges and promising future research directions.

* 19 pages, 8 figures 
Viaarxiv icon

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation

Sep 16, 2022
Lin Chen, Zhixiang Wei, Xin Jin, Huaian Chen, Miao Zheng, Kai Chen, Yi Jin

Figure 1 for Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Figure 2 for Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Figure 3 for Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
Figure 4 for Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation

In unsupervised domain adaptation (UDA), directly adapting from the source to the target domain usually suffers significant discrepancies and leads to insufficient alignment. Thus, many UDA works attempt to vanish the domain gap gradually and softly via various intermediate spaces, dubbed domain bridging (DB). However, for dense prediction tasks such as domain adaptive semantic segmentation (DASS), existing solutions have mostly relied on rough style transfer and how to elegantly bridge domains is still under-explored. In this work, we resort to data mixing to establish a deliberated domain bridging (DDB) for DASS, through which the joint distributions of source and target domains are aligned and interacted with each in the intermediate space. At the heart of DDB lies a dual-path domain bridging step for generating two intermediate domains using the coarse-wise and the fine-wise data mixing techniques, alongside a cross-path knowledge distillation step for taking two complementary models trained on generated intermediate samples as 'teachers' to develop a superior 'student' in a multi-teacher distillation manner. These two optimization steps work in an alternating way and reinforce each other to give rise to DDB with strong adaptation power. Extensive experiments on adaptive segmentation tasks with different settings demonstrate that our DDB significantly outperforms state-of-the-art methods. Code is available at https://github.com/xiaoachen98/DDB.git.

* Accepted at NeurIPS2022 
Viaarxiv icon

Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation

Apr 08, 2022
Lin Chen, Huaian Chen, Zhixiang Wei, Xin Jin, Xiao Tan, Yi Jin, Enhong Chen

Figure 1 for Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation
Figure 2 for Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation
Figure 3 for Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation
Figure 4 for Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation

Adversarial learning has achieved remarkable performances for unsupervised domain adaptation (UDA). Existing adversarial UDA methods typically adopt an additional discriminator to play the min-max game with a feature extractor. However, most of these methods failed to effectively leverage the predicted discriminative information, and thus cause mode collapse for generator. In this work, we address this problem from a different perspective and design a simple yet effective adversarial paradigm in the form of a discriminator-free adversarial learning network (DALN), wherein the category classifier is reused as a discriminator, which achieves explicit domain alignment and category distinguishment through a unified objective, enabling the DALN to leverage the predicted discriminative information for sufficient feature alignment. Basically, we introduce a Nuclear-norm Wasserstein discrepancy (NWD) that has definite guidance meaning for performing discrimination. Such NWD can be coupled with the classifier to serve as a discriminator satisfying the K-Lipschitz constraint without the requirements of additional weight clipping or gradient penalty strategy. Without bells and whistles, DALN compares favorably against the existing state-of-the-art (SOTA) methods on a variety of public datasets. Moreover, as a plug-and-play technique, NWD can be directly used as a generic regularizer to benefit existing UDA algorithms. Code is available at https://github.com/xiaoachen98/DALN.

* Accepted by CVPR2022 
Viaarxiv icon