Alert button
Picture for Wei Dong

Wei Dong

Alert button

Towards High-quality HDR Deghosting with Conditional Diffusion Models

Nov 02, 2023
Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc Van Gool, Yanning Zhang

High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting problem as an image generation that leverages LDR features as the diffusion model's condition, consisting of the feature condition generator and the noise predictor. Feature condition generator employs attention and Domain Feature Alignment (DFA) layer to transform the intermediate features to avoid ghosting artifacts. With the learned features as conditions, the noise predictor leverages a stochastic iterative denoising process for diffusion models to generate an HDR image by steering the sampling process. Furthermore, to mitigate semantic confusion caused by the saturation problem of LDR images, we design a sliding window noise estimator to sample smooth noise in a patch-based manner. In addition, an image space loss is proposed to avoid the color distortion of the estimated HDR results. We empirically evaluate our model on benchmark datasets for HDR imaging. The results demonstrate that our approach achieves state-of-the-art performances and well generalization to real-world images.

* accepted by IEEE TCSVT 
Viaarxiv icon

Edge Cloud Collaborative Stream Computing for Real-Time Structural Health Monitoring

Oct 11, 2023
Wenzhao Zhang, Cheng Guo, Yi Gao, Wei Dong

Structural Health Monitoring (SHM) is crucial for the safety and maintenance of various infrastructures. Due to the large amount of data generated by numerous sensors and the high real-time requirements of many applications, SHM poses significant challenges. Although the cloud-centric stream computing paradigm opens new opportunities for real-time data processing, it consumes too much network bandwidth. In this paper, we propose ECStream, an Edge Cloud collaborative fine-grained stream operator scheduling framework for SHM. We collectively consider atomic and composite operators together with their iterative computability to model and formalize the problem of minimizing bandwidth usage and end-to-end operator processing latency. Preliminary evaluation results show that ECStream can effectively balance bandwidth usage and end-to-end operator computation latency, reducing bandwidth usage by 73.01% and latency by 34.08% on average compared to the cloud-centric approach.

Viaarxiv icon

Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing

Oct 10, 2023
Wei Dong, Dawei Yan, Zhijun Lin, Peng Wang

The advent of high-capacity pre-trained models has revolutionized problem-solving in computer vision, shifting the focus from training task-specific models to adapting pre-trained models. Consequently, effectively adapting large pre-trained models to downstream tasks in an efficient manner has become a prominent research area. Existing solutions primarily concentrate on designing lightweight adapters and their interaction with pre-trained models, with the goal of minimizing the number of parameters requiring updates. In this study, we propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation from a fresh perspective. Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme. Specifically, we leverage symmetric down-/up-projections to construct bottleneck operations, which are shared across layers. By learning low-dimensional re-scaling coefficients, we can effectively re-compose layer-adaptive adapters. This parameter-sharing strategy in adapter design allows us to significantly reduce the number of new parameters while maintaining satisfactory performance, thereby offering a promising approach to compress the adaptation cost. We conduct experiments on 24 downstream image classification tasks using various Vision Transformer variants to evaluate our method. The results demonstrate that our approach achieves compelling transfer learning performance with a reduced parameter count. Our code is available at \href{https://github.com/DavidYanAnDe/ARC}{https://github.com/DavidYanAnDe/ARC}.

* Paper is accepted to NeurIPS 2023 
Viaarxiv icon

FlexDelta: A flexure-based fully decoupled parallel $xyz$ positioning stage with long stroke

Jul 19, 2023
Qianjun Zhang, Wei Dong, Qingsong Xu, Bimal J. Goteea, Yongzhuo Gao

Figure 1 for FlexDelta: A flexure-based fully decoupled parallel $xyz$ positioning stage with long stroke
Figure 2 for FlexDelta: A flexure-based fully decoupled parallel $xyz$ positioning stage with long stroke
Figure 3 for FlexDelta: A flexure-based fully decoupled parallel $xyz$ positioning stage with long stroke
Figure 4 for FlexDelta: A flexure-based fully decoupled parallel $xyz$ positioning stage with long stroke

Decoupled parallel $xyz$ positioning stages with large stroke have been desired in high-speed and precise positioning fields. However, currently such stages are either short in stroke or unqualified in parasitic motion and coupling rate. This paper proposes a novel flexure-based decoupled parallel $xyz$ positioning stage (FlexDelta) and conducts its conceptual design, modeling, and experimental study. Firstly, the working principle of FlexDelta is introduced, followed by its mechanism design with flexure. Secondly, the stiffness model of flexure is established via matrix-based Castigliano's second theorem, and the influence of its lateral stiffness on the stiffness model of FlexDelta is comprehensively investigated and then optimally designed. Finally, experimental study was carried out based on the prototype fabricated. The results reveal that the positioning stage features centimeter-stroke in three axes, with coupling rate less than 0.53%, parasitic motion less than 1.72 mrad over full range. And its natural frequencies are 20.8 Hz, 20.8 Hz, and 22.4 Hz for $x$, $y$, and $z$ axis respectively. Multi-axis path tracking tests were also carried out, which validates its dynamic performance with micrometer error.

Viaarxiv icon

Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

May 22, 2023
Wei Dong, Chris Choy, Charles Loop, Or Litany, Yuke Zhu, Anima Anandkumar

Figure 1 for Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids
Figure 2 for Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids
Figure 3 for Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids
Figure 4 for Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids

Indoor scene reconstruction from monocular images has long been sought after by augmented reality and robotics developers. Recent advances in neural field representations and monocular priors have led to remarkable results in scene-level surface reconstructions. The reliance on Multilayer Perceptrons (MLP), however, significantly limits speed in training and rendering. In this work, we propose to directly use signed distance function (SDF) in sparse voxel block grids for fast and accurate scene reconstruction without MLPs. Our globally sparse and locally dense data structure exploits surfaces' spatial sparsity, enables cache-friendly queries, and allows direct extensions to multi-modal data such as color and semantic labels. To apply this representation to monocular scene reconstruction, we develop a scale calibration algorithm for fast geometric initialization from monocular depth priors. We apply differentiable volume rendering from this initialization to refine details with fast convergence. We also introduce efficient high-dimensional Continuous Random Fields (CRFs) to further exploit the semantic-geometry consistency between scene objects. Experiments show that our approach is 10x faster in training and 100x faster in rendering while achieving comparable accuracy to state-of-the-art neural implicit methods.

* CVPR 2023 
Viaarxiv icon

Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt

May 08, 2023
Han Zhou, Wei Dong, Yangyi Liu, Jun Chen

Figure 1 for Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
Figure 2 for Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
Figure 3 for Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt
Figure 4 for Breaking Through the Haze: An Advanced Non-Homogeneous Dehazing Method based on Fast Fourier Convolution and ConvNeXt

Haze usually leads to deteriorated images with low contrast, color shift and structural distortion. We observe that many deep learning based models exhibit exceptional performance on removing homogeneous haze, but they usually fail to address the challenge of non-homogeneous dehazing. Two main factors account for this situation. Firstly, due to the intricate and non uniform distribution of dense haze, the recovery of structural and chromatic features with high fidelity is challenging, particularly in regions with heavy haze. Secondly, the existing small scale datasets for non-homogeneous dehazing are inadequate to support reliable learning of feature mappings between hazy images and their corresponding haze-free counterparts by convolutional neural network (CNN)-based models. To tackle these two challenges, we propose a novel two branch network that leverages 2D discrete wavelete transform (DWT), fast Fourier convolution (FFC) residual block and a pretrained ConvNeXt model. Specifically, in the DWT-FFC frequency branch, our model exploits DWT to capture more high-frequency features. Moreover, by taking advantage of the large receptive field provided by FFC residual blocks, our model is able to effectively explore global contextual information and produce images with better perceptual quality. In the prior knowledge branch, an ImageNet pretrained ConvNeXt as opposed to Res2Net is adopted. This enables our model to learn more supplementary information and acquire a stronger generalization ability. The feasibility and effectiveness of the proposed method is demonstrated via extensive experiments and ablation studies. The code is available at https://github.com/zhouh115/DWT-FFC.

* Accepted by CVPRW 2023 
Viaarxiv icon

One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

Mar 28, 2023
Deze Wang, Boxing Chen, Shanshan Li, Wei Luo, Shaoliang Peng, Wei Dong, Xiangke Liao

Figure 1 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Figure 2 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Figure 3 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization
Figure 4 for One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

As pre-trained models automate many code intelligence tasks, a widely used paradigm is to fine-tune a model on the task dataset for each programming language. A recent study reported that multilingual fine-tuning benefits a range of tasks and models. However, we find that multilingual fine-tuning leads to performance degradation on recent models UniXcoder and CodeT5. To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it. Updating only 0.6\% of the overall parameters compared to full-model fine-tuning for each programming language, adapter tuning yields consistent improvements on code search and summarization tasks, achieving state-of-the-art results. In addition, we experimentally show its effectiveness in cross-lingual and low-resource scenarios. Multilingual fine-tuning with 200 samples per programming language approaches the results fine-tuned with the entire dataset on code summarization. Our experiments on three probing tasks show that adapter tuning significantly outperforms full-model fine-tuning and effectively overcomes catastrophic forgetting.

* Accepted to the 45th International Conference on Software Engineering (ICSE 2023) 
Viaarxiv icon

Self-Supervised Node Representation Learning via Node-to-Neighbourhood Alignment

Feb 10, 2023
Wei Dong, Dawei Yan, Peng Wang

Figure 1 for Self-Supervised Node Representation Learning via Node-to-Neighbourhood Alignment
Figure 2 for Self-Supervised Node Representation Learning via Node-to-Neighbourhood Alignment
Figure 3 for Self-Supervised Node Representation Learning via Node-to-Neighbourhood Alignment
Figure 4 for Self-Supervised Node Representation Learning via Node-to-Neighbourhood Alignment

Self-supervised node representation learning aims to learn node representations from unlabelled graphs that rival the supervised counterparts. The key towards learning informative node representations lies in how to effectively gain contextual information from the graph structure. In this work, we present simple-yet-effective self-supervised node representation learning via aligning the hidden representations of nodes and their neighbourhood. Our first idea achieves such node-to-neighbourhood alignment by directly maximizing the mutual information between their representations, which, we prove theoretically, plays the role of graph smoothing. Our framework is optimized via a surrogate contrastive loss and a Topology-Aware Positive Sampling (TAPS) strategy is proposed to sample positives by considering the structural dependencies between nodes, which enables offline positive selection. Considering the excessive memory overheads of contrastive learning, we further propose a negative-free solution, where the main contribution is a Graph Signal Decorrelation (GSD) constraint to avoid representation collapse and over-smoothing. The GSD constraint unifies some of the existing constraints and can be used to derive new implementations to combat representation collapse. By applying our methods on top of simple MLP-based node representation encoders, we learn node representations that achieve promising node classification performance on a set of graph-structured datasets from small- to large-scale.

* arXiv admin note: substantial text overlap with arXiv:2203.12265 
Viaarxiv icon

Perching on Moving Inclined Surfaces using Uncertainty Tolerant Planner and Thrust Regulation

Dec 21, 2022
Sensen Liu, Wenkang Hu, Zhaoying Wang, Wei Dong, Xinjun Sheng

Figure 1 for Perching on Moving Inclined Surfaces using Uncertainty Tolerant Planner and Thrust Regulation
Figure 2 for Perching on Moving Inclined Surfaces using Uncertainty Tolerant Planner and Thrust Regulation
Figure 3 for Perching on Moving Inclined Surfaces using Uncertainty Tolerant Planner and Thrust Regulation
Figure 4 for Perching on Moving Inclined Surfaces using Uncertainty Tolerant Planner and Thrust Regulation

Quadrotors with the ability to perch on moving inclined surfaces can save energy and extend their travel distance by leveraging ground vehicles. Achieving dynamic perching places high demands on the performance of trajectory planning and terminal state accuracy in SE(3). However, in the perching process, uncertainties in target surface prediction, tracking control and external disturbances may cause trajectory planning failure or lead to unacceptable terminal errors. To address these challenges, we first propose a trajectory planner that considers adaptation to uncertainties in target prediction and tracking control. To facilitate this work, the reachable set of quadrotors' states is first analyzed. The states whose reachable sets possess the largest coverage probability for uncertainty targets, are defined as optimal waypoints. Subsequently, an approach to seek local optimal waypoints for static and moving uncertainty targets is proposed. A real-time trajectory planner based on optimized waypoints is developed accordingly. Secondly, thrust regulation is also implemented in the terminal attitude tracking stage to handle external disturbances. When a quadrotor's attitude is commanded to align with target surfaces, the thrust is optimized to minimize terminal errors. This makes the terminal position and velocity be controlled in closed-loop manner. Therefore, the resistance to disturbances and terminal accuracy is improved. Extensive simulation experiments demonstrate that our methods can improve the accuracy of terminal states under uncertainties. The success rate is approximately increased by $50\%$ compared to the two-end planner without thrust regulation. Perching on the rear window of a car is also achieved using our proposed heterogeneous cooperation system outdoors. This validates the feasibility and practicality of our methods.

Viaarxiv icon