Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wen Yang

CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Sep 29, 2023

Chi Zhang, Xiang Zhang, Mingyuan Lin, Cheng Li, Chu He, Wen Yang, Gui-Song Xia, Lei Yu

Figure 1 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 2 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 3 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Figure 4 for CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

Abstract:Even though the collaboration between traditional and neuromorphic event cameras brings prosperity to frame-event based vision applications, the performance is still confined by the resolution gap crossing two modalities in both spatial and temporal domains. This paper is devoted to bridging the gap by increasing the temporal resolution for images, i.e., motion deblurring, and the spatial resolution for events, i.e., event super-resolving, respectively. To this end, we introduce CrossZoom, a novel unified neural Network (CZ-Net) to jointly recover sharp latent sequences within the exposure period of a blurry input and the corresponding High-Resolution (HR) events. Specifically, we present a multi-scale blur-event fusion architecture that leverages the scale-variant properties and effectively fuses cross-modality information to achieve cross-enhancement. Attention-based adaptive enhancement and cross-interaction prediction modules are devised to alleviate the distortions inherent in Low-Resolution (LR) events and enhance the final results through the prior blur-event complementary information. Furthermore, we propose a new dataset containing HR sharp-blurry images and the corresponding HR-LR event streams to facilitate future research. Extensive qualitative and quantitative experiments on synthetic and real-world datasets demonstrate the effectiveness and robustness of the proposed method. Codes and datasets are released at https://bestrivenzc.github.io/CZ-Net/.

Via

Access Paper or Ask Questions

QuadricsNet: Learning Concise Representation for Geometric Primitives in Point Clouds

Sep 25, 2023

Ji Wu, Huai Yu, Wen Yang, Gui-Song Xia

Abstract:This paper presents a novel framework to learn a concise geometric primitive representation for 3D point clouds. Different from representing each type of primitive individually, we focus on the challenging problem of how to achieve a concise and uniform representation robustly. We employ quadrics to represent diverse primitives with only 10 parameters and propose the first end-to-end learning-based framework, namely QuadricsNet, to parse quadrics in point clouds. The relationships between quadrics mathematical formulation and geometric attributes, including the type, scale and pose, are insightfully integrated for effective supervision of QuaidricsNet. Besides, a novel pattern-comprehensive dataset with quadrics segments and objects is collected for training and evaluation. Experiments demonstrate the effectiveness of our concise representation and the robustness of QuadricsNet. Our code is available at \url{https://github.com/MichaelWu99-lab/QuadricsNet}

* Submitted to ICRA 2024. 7 pages

Via

Access Paper or Ask Questions

2D-3D Pose Tracking with Multi-View Constraints

Sep 20, 2023

Huai Yu, Kuangyi Chen, Wen Yang, Sebastian Scherer, Gui-Song Xia

Abstract:Camera localization in 3D LiDAR maps has gained increasing attention due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing methods mostly focus on addressing the cross-modal gaps, estimating camera poses frame by frame without considering the relationship between adjacent frames, which makes the pose tracking unstable. To alleviate this, we propose to couple the 2D-3D correspondences between adjacent frames using the 2D-2D feature matching, establishing the multi-view geometrical constraints for simultaneously estimating multiple camera poses. Specifically, we propose a new 2D-3D pose tracking framework, which consists: a front-end hybrid flow estimation network for consecutive frames and a back-end pose optimization module. We further design a cross-modal consistency-based loss to incorporate the multi-view constraints during the training and inference process. We evaluate our proposed framework on the KITTI and Argoverse datasets. Experimental results demonstrate its superior performance compared to existing frame-by-frame 2D-3D pose tracking methods and state-of-the-art vision-only pose tracking algorithms. More online pose tracking videos are available at \url{https://youtu.be/yfBRdg7gw5M}

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Aug 11, 2023

Xiang Zhang, Lei Yu, Wen Yang, Jianzhuang Liu, Gui-Song Xia

Figure 1 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Figure 2 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Figure 3 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Figure 4 for Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Abstract:Event-based motion deblurring has shown promising results by exploiting low-latency events. However, current approaches are limited in their practical usage, as they assume the same spatial resolution of inputs and specific blurriness distributions. This work addresses these limitations and aims to generalize the performance of event-based deblurring in real-world scenarios. We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur. A two-stage self-supervised learning scheme is then developed to fit real-world data distribution. By utilizing the relativity of blurriness, our approach efficiently ensures the restored brightness and structure of latent images and further generalizes deblurring performance to handle varying spatial and temporal scales of motion blur in a self-distillation manner. Our method is extensively evaluated, demonstrating remarkable performance, and we also introduce a real-world dataset consisting of multi-scale blurry frames and events to facilitate research in event-based deblurring.

* Accepted by ICCV 2023

Via

Access Paper or Ask Questions

BigTrans: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

May 29, 2023

Wen Yang, Chong Li, Jiajun Zhang, Chengqing Zong

Abstract:Large language models (LLMs) demonstrate promising translation performance among various natural languages. However, many LLMs especially the open-sourced ones, such as BLOOM and LLaMA, are English-dominant and support only dozens of natural languages, making the potential of LLMs on language translation less explored. In this work, we present BigTrans which adapts LLaMA that covers only 20 languages and enhances it with multilingual translation capability on more than 100 languages. BigTrans is built upon LLaMA-13B and it is optimized in three steps. First, we continue training LLaMA with massive Chinese monolingual data. Second, we continue training the model with a large-scale parallel dataset that covers 102 natural languages. Third, we instruct-tune the foundation model with multilingual translation instructions, leading to our BigTrans model. The preliminary experiments on multilingual translation show that BigTrans performs comparably with ChatGPT and Google Translate in many languages and even outperforms ChatGPT in 8 language pairs. We release the BigTrans model and hope it can advance the research progress.

* 12 pages, 4 figures. Our model is available at https://github.com/ZNLP/BigTrans

Via

Access Paper or Ask Questions

Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Apr 19, 2023

Yangguang Wang, Xiang Zhang, Mingyuan Lin, Lei Yu, Boxin Shi, Wen Yang, Gui-Song Xia

Figure 1 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Figure 2 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Figure 3 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Figure 4 for Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events

Abstract:Scene Dynamic Recovery (SDR) by inverting distorted Rolling Shutter (RS) images to an undistorted high frame-rate Global Shutter (GS) video is a severely ill-posed problem due to the missing temporal dynamic information in both RS intra-frame scanlines and inter-frame exposures, particularly when prior knowledge about camera/object motions is unavailable. Commonly used artificial assumptions on scenes/motions and data-specific characteristics are prone to producing sub-optimal solutions in real-world scenarios. To address this challenge, we propose an event-based SDR network within a self-supervised learning paradigm, i.e., SelfUnroll. We leverage the extremely high temporal resolution of event cameras to provide accurate inter/intra-frame dynamic information. Specifically, an Event-based Inter/intra-frame Compensator (E-IC) is proposed to predict the per-pixel dynamic between arbitrary time intervals, including the temporal transition and spatial translation. Exploring connections in terms of RS-RS, RS-GS, and GS-RS, we explicitly formulate mutual constraints with the proposed E-IC, resulting in supervisions without ground-truth GS images. Extensive evaluations over synthetic and real datasets demonstrate that the proposed method achieves state-of-the-art and shows remarkable performance for event-based RS2GS inversion in real-world scenarios. The dataset and code are available at https://w3un.github.io/selfunroll/.

Via

Access Paper or Ask Questions

Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Apr 18, 2023

Chang Xu, Jian Ding, Jinwang Wang, Wen Yang, Huai Yu, Lei Yu, Gui-Song Xia

Figure 1 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Figure 2 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Figure 3 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Figure 4 for Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

Abstract:Detecting arbitrarily oriented tiny objects poses intense challenges to existing detectors, especially for label assignment. Despite the exploration of adaptive label assignment in recent oriented object detectors, the extreme geometry shape and limited feature of oriented tiny objects still induce severe mismatch and imbalance issues. Specifically, the position prior, positive sample feature, and instance are mismatched, and the learning of extreme-shaped objects is biased and unbalanced due to little proper feature supervision. To tackle these issues, we propose a dynamic prior along with the coarse-to-fine assigner, dubbed DCFL. For one thing, we model the prior, label assignment, and object representation all in a dynamic manner to alleviate the mismatch issue. For another, we leverage the coarse prior matching and finer posterior constraint to dynamically assign labels, providing appropriate and relatively balanced supervision for diverse instances. Extensive experiments on six datasets show substantial improvements to the baseline. Notably, we obtain the state-of-the-art performance for one-stage detectors on the DOTA-v1.5, DOTA-v2.0, and DIOR-R datasets under single-scale training and testing. Codes are available at https://github.com/Chasel-Tsui/mmrotate-dcfl.

* Accepted by CVPR2023

Via

Access Paper or Ask Questions

Recovering Continuous Scene Dynamics from A Single Blurry Image with Events

Apr 05, 2023

Zhangyi Cheng, Xiang Zhang, Lei Yu, Jianzhuang Liu, Wen Yang, Gui-Song Xia

Abstract:This paper aims at demystifying a single motion-blurred image with events and revealing temporally continuous scene dynamics encrypted behind motion blurs. To achieve this end, an Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events, enabling the latent sharp image restoration of arbitrary timestamps in the range of imaging exposures. Specifically, a dual attention transformer is proposed to efficiently leverage merits from both modalities, i.e., the high temporal resolution of event features and the smoothness of image features, alleviating temporal ambiguities while suppressing the event noise. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps. Motion- and texture-guided supervisions are employed simultaneously to enhance restorations of the non-referenced timestamps and improve the overall sharpness. Experiments on synthetic, semi-synthetic, and real-world datasets demonstrate that our proposed method outperforms state-of-the-art methods by a large margin in terms of both objective PSNR and SSIM measurements and subjective evaluations.

Via

Access Paper or Ask Questions

Learning to Super-Resolve Blurry Images with Events

Feb 27, 2023

Lei Yu, Bishan Wang, Xiang Zhang, Haijian Zhang, Wen Yang, Jianzhuang Liu, Gui-Song Xia

Abstract:Super-Resolution from a single motion Blurred image (SRB) is a severely ill-posed problem due to the joint degradation of motion blurs and low spatial resolution. In this paper, we employ events to alleviate the burden of SRB and propose an Event-enhanced SRB (E-SRB) algorithm, which can generate a sequence of sharp and clear images with High Resolution (HR) from a single blurry image with Low Resolution (LR). To achieve this end, we formulate an event-enhanced degeneration model to consider the low spatial resolution, motion blurs, and event noises simultaneously. We then build an event-enhanced Sparse Learning Network (eSL-Net++) upon a dual sparse learning scheme where both events and intensity frames are modeled with sparse representations. Furthermore, we propose an event shuffle-and-merge scheme to extend the single-frame SRB to the sequence-frame SRB without any additional training process. Experimental results on synthetic and real-world datasets show that the proposed eSL-Net++ outperforms state-of-the-art methods by a large margin. Datasets, codes, and more results are available at https://github.com/ShinyWang33/eSL-Net-Plusplus.

* Accepted by IEEE TPAMI

Via

Access Paper or Ask Questions

High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Jan 09, 2023

Fang Xu, Yilei Shi, Patrick Ebel, Wen Yang, Xiao Xiang Zhu

Figure 1 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Figure 2 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Figure 3 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Figure 4 for High-Resolution Cloud Removal with Multi-Modal and Multi-Resolution Data Fusion: A New Baseline and Benchmark

Abstract:In this paper, we introduce Planet-CR, a benchmark dataset for high-resolution cloud removal with multi-modal and multi-resolution data fusion. Planet-CR is the first public dataset for cloud removal to feature globally sampled high resolution optical observations, in combination with paired radar measurements as well as pixel-level land cover annotations. It provides solid basis for exhaustive evaluation in terms of generating visually pleasing textures and semantically meaningful structures. With this dataset, we consider the problem of cloud removal in high resolution optical remote sensing imagery by integrating multi-modal and multi-resolution information. Existing multi-modal data fusion based methods, which assume the image pairs are aligned pixel-to-pixel, are hence not appropriate for this problem. To this end, we design a new baseline named Align-CR to perform the low-resolution SAR image guided high-resolution optical image cloud removal. It implicitly aligns the multi-modal and multi-resolution data during the reconstruction process to promote the cloud removal performance. The experimental results demonstrate that the proposed Align-CR method gives the best performance in both visual recovery quality and semantic recovery quality. The project is available at https://github.com/zhu-xlab/Planet-CR, and hope this will inspire future research.

Via

Access Paper or Ask Questions