Alert button
Picture for Ran Cheng

Ran Cheng

Alert button

Seed Feature Maps-based CNN Models for LEO Satellite Remote Sensing Services

Aug 12, 2023
Zhichao Lu, Chuntao Ding, Shangguang Wang, Ran Cheng, Felix Juefei-Xu, Vishnu Naresh Boddeti

Figure 1 for Seed Feature Maps-based CNN Models for LEO Satellite Remote Sensing Services
Figure 2 for Seed Feature Maps-based CNN Models for LEO Satellite Remote Sensing Services
Figure 3 for Seed Feature Maps-based CNN Models for LEO Satellite Remote Sensing Services
Figure 4 for Seed Feature Maps-based CNN Models for LEO Satellite Remote Sensing Services

Deploying high-performance convolutional neural network (CNN) models on low-earth orbit (LEO) satellites for rapid remote sensing image processing has attracted significant interest from industry and academia. However, the limited resources available on LEO satellites contrast with the demands of resource-intensive CNN models, necessitating the adoption of ground-station server assistance for training and updating these models. Existing approaches often require large floating-point operations (FLOPs) and substantial model parameter transmissions, presenting considerable challenges. To address these issues, this paper introduces a ground-station server-assisted framework. With the proposed framework, each layer of the CNN model contains only one learnable feature map (called the seed feature map) from which other feature maps are generated based on specific rules. The hyperparameters of these rules are randomly generated instead of being trained, thus enabling the generation of multiple feature maps from the seed feature map and significantly reducing FLOPs. Furthermore, since the random hyperparameters can be saved using a few random seeds, the ground station server assistance can be facilitated in updating the CNN model deployed on the LEO satellite. Experimental results on the ISPRS Vaihingen, ISPRS Potsdam, UAVid, and LoveDA datasets for semantic segmentation services demonstrate that the proposed framework outperforms existing state-of-the-art approaches. In particular, the SineFM-based model achieves a higher mIoU than the UNetFormer on the UAVid dataset, with 3.3x fewer parameters and 2.2x fewer FLOPs.

* 11 pages 
Viaarxiv icon

A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization

Aug 10, 2023
Yansong Huang, Zherui Zhang, Ao Jiao, Yuxin Ma, Ran Cheng

Figure 1 for A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization
Figure 2 for A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization
Figure 3 for A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization
Figure 4 for A Comparative Visual Analytics Framework for Evaluating Evolutionary Processes in Multi-objective Optimization

Evolutionary multi-objective optimization (EMO) algorithms have been demonstrated to be effective in solving multi-criteria decision-making problems. In real-world applications, analysts often employ several algorithms concurrently and compare their solution sets to gain insight into the characteristics of different algorithms and explore a broader range of feasible solutions. However, EMO algorithms are typically treated as black boxes, leading to difficulties in performing detailed analysis and comparisons between the internal evolutionary processes. Inspired by the successful application of visual analytics tools in explainable AI, we argue that interactive visualization can significantly enhance the comparative analysis between multiple EMO algorithms. In this paper, we present a visual analytics framework that enables the exploration and comparison of evolutionary processes in EMO algorithms. Guided by a literature review and expert interviews, the proposed framework addresses various analytical tasks and establishes a multi-faceted visualization design to support the comparative analysis of intermediate generations in the evolution as well as solution sets. We demonstrate the effectiveness of our framework through case studies on benchmarking and real-world multi-objective optimization problems to elucidate how analysts can leverage our framework to inspect and compare diverse algorithms.

* Accepted by IEEE VIS 2023 (will appear in IEEE TVCG) 
Viaarxiv icon

Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives

Aug 03, 2023
Chuntao Ding, Zhichao Lu, Shangguang Wang, Ran Cheng, Vishnu Naresh Boddeti

Figure 1 for Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives
Figure 2 for Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives
Figure 3 for Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives
Figure 4 for Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives

Multi-task learning (MTL) seeks to learn a single model to accomplish multiple tasks by leveraging shared information among the tasks. Existing MTL models, however, have been known to suffer from negative interference among tasks. Efforts to mitigate task interference have focused on either loss/gradient balancing or implicit parameter partitioning with partial overlaps among the tasks. In this paper, we propose ETR-NLP to mitigate task interference through a synergistic combination of non-learnable primitives (NLPs) and explicit task routing (ETR). Our key idea is to employ non-learnable primitives to extract a diverse set of task-agnostic features and recombine them into a shared branch common to all tasks and explicit task-specific branches reserved for each task. The non-learnable primitives and the explicit decoupling of learnable parameters into shared and task-specific ones afford the flexibility needed for minimizing task interference. We evaluate the efficacy of ETR-NLP networks for both image-level classification and pixel-level dense prediction MTL problems. Experimental results indicate that ETR-NLP significantly outperforms state-of-the-art baselines with fewer learnable parameters and similar FLOPs across all datasets. Code is available at this \href{https://github.com/zhichao-lu/etr-nlp-mtl}.

* CVPR 2023 
Viaarxiv icon

Automotive Object Detection via Learning Sparse Events by Temporal Dynamics of Spiking Neurons

Jul 24, 2023
Hu Zhang, Luziwei Leng, Kaiwei Che, Qian Liu, Jie Cheng, Qinghai Guo, Jiangxing Liao, Ran Cheng

Figure 1 for Automotive Object Detection via Learning Sparse Events by Temporal Dynamics of Spiking Neurons
Figure 2 for Automotive Object Detection via Learning Sparse Events by Temporal Dynamics of Spiking Neurons
Figure 3 for Automotive Object Detection via Learning Sparse Events by Temporal Dynamics of Spiking Neurons
Figure 4 for Automotive Object Detection via Learning Sparse Events by Temporal Dynamics of Spiking Neurons

Event-based sensors, with their high temporal resolution (1us) and dynamical range (120dB), have the potential to be deployed in high-speed platforms such as vehicles and drones. However, the highly sparse and fluctuating nature of events poses challenges for conventional object detection techniques based on Artificial Neural Networks (ANNs). In contrast, Spiking Neural Networks (SNNs) are well-suited for representing event-based data due to their inherent temporal dynamics. In particular, we demonstrate that the membrane potential dynamics can modulate network activity upon fluctuating events and strengthen features of sparse input. In addition, the spike-triggered adaptive threshold can stabilize training which further improves network performance. Based on this, we develop an efficient spiking feature pyramid network for event-based object detection. Our proposed SNN outperforms previous SNNs and sophisticated ANNs with attention mechanisms, achieving a mean average precision (map50) of 47.7% on the Gen1 benchmark dataset. This result significantly surpasses the previous best SNN by 9.7% and demonstrates the potential of SNNs for event-based vision. Our model has a concise architecture while maintaining high accuracy and much lower computation cost as a result of sparse computation. Our code will be publicly available.

Viaarxiv icon

Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations

Jul 23, 2023
Runze Yuan, Ran Cheng, Lige Liu, Tao Sun, Laurent Kneip

Figure 1 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations
Figure 2 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations
Figure 3 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations
Figure 4 for Scale jump-aware pose graph relaxation for monocular SLAM with re-initializations

Pose graph relaxation has become an indispensable addition to SLAM enabling efficient global registration of sensor reference frames under the objective of satisfying pair-wise relative transformation constraints. The latter may be given by incremental motion estimation or global place recognition. While the latter case enables loop closures and drift compensation, care has to be taken in the monocular case in which local estimates of structure and displacements can differ from reality not just in terms of noise, but also in terms of a scale factor. Owing to the accumulation of scale propagation errors, this scale factor is drifting over time, hence scale-drift aware pose graph relaxation has been introduced. We extend this idea to cases in which the relative scale between subsequent sensor frames is unknown, a situation that can easily occur if monocular SLAM enters re-initialization and no reliable overlap between successive local maps can be identified. The approach is realized by a hybrid pose graph formulation that combines the regular similarity consistency terms with novel, scale-blind constraints. We apply the technique to the practically relevant case of small indoor service robots capable of effectuating purely rotational displacements, a condition that can easily cause tracking failures. We demonstrate that globally consistent trajectories can be recovered even if multiple re-initializations occur along the loop, and present an in-depth study of success and failure cases.

* 8 pages, 23 figures, International Conference on Intelligent Robots and Systems 2023 
Viaarxiv icon

Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

Jun 21, 2023
Boyan Li, Luziwei Leng, Ran Cheng, Shuaijie Shen, Kaixuan Zhang, Jianguo Zhang, Jianxing Liao

Figure 1 for Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference
Figure 2 for Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference
Figure 3 for Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference
Figure 4 for Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference

Advancements in adapting deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens. However, the inability of Multiplication-Free Inference (MFI) to harmonize with attention and transformer mechanisms, which are critical to superior performance on high-resolution vision tasks, imposes limitations on these gains. To address this, our research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs). We propose an innovative spiking MLP architecture that uses batch normalization to retain MFI compatibility and introduces a spiking patch encoding layer to reinforce local feature extraction capabilities. As a result, we establish an efficient multi-stage spiking MLP network that effectively blends global receptive fields with local feature extraction for comprehensive spike-based computation. Without relying on pre-training or sophisticated SNN training techniques, our network secures a top-1 accuracy of 66.39% on the ImageNet-1K dataset, surpassing the directly trained spiking ResNet-34 by 2.67%. Furthermore, we curtail computational costs, model capacity, and simulation steps. An expanded version of our network challenges the performance of the spiking VGG-16 network with a 71.64% top-1 accuracy, all while operating with a model capacity 2.1 times smaller. Our findings accentuate the potential of our deep SNN architecture in seamlessly integrating global and local learning abilities. Interestingly, the trained receptive field in our network mirrors the activity patterns of cortical cells.

* 11 pages, 6 figures 
Viaarxiv icon

Rethinking Population-assisted Off-policy Reinforcement Learning

May 04, 2023
Bowen Zheng, Ran Cheng

Figure 1 for Rethinking Population-assisted Off-policy Reinforcement Learning
Figure 2 for Rethinking Population-assisted Off-policy Reinforcement Learning
Figure 3 for Rethinking Population-assisted Off-policy Reinforcement Learning
Figure 4 for Rethinking Population-assisted Off-policy Reinforcement Learning

While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods.

* Genetic and Evolutionary Computation Conference (GECCO '23) 
Viaarxiv icon

Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network

Apr 24, 2023
Rui Zhang, Luziwei Leng, Kaiwei Che, Hu Zhang, Jie Cheng, Qinghai Guo, Jiangxing Liao, Ran Cheng

Figure 1 for Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network
Figure 2 for Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network
Figure 3 for Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network
Figure 4 for Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network

Low-power event-driven computation and inherent temporal dynamics render spiking neural networks (SNNs) ideal candidates for processing highly dynamic and asynchronous signals from event-based sensors. However, due to the challenges in training and architectural design constraints, there is a scarcity of competitive demonstrations of SNNs in event-based dense prediction compared to artificial neural networks (ANNs). In this work, we construct an efficient spiking encoder-decoder network for large-scale event-based semantic segmentation tasks, optimizing the encoder with hierarchical search. To improve learning from highly dynamic event streams, we exploit the intrinsic adaptive threshold of spiking neurons to modulate network activation. Additionally, we develop a dual-path spiking spatially-adaptive modulation (SSAM) block to enhance the representation of sparse events, significantly improving network performance. Our network achieves 72.57% mean intersection over union (MIoU) on the DDD17 dataset and 57.22% MIoU on the newly proposed larger DSEC-Semantic dataset, surpassing current record ANNs by 4% while utilizing much lower computation costs. To the best of our knowledge, this is the first instance of SNNs outperforming ANNs in challenging event-based semantic segmentation tasks, demonstrating their immense potential in event-based vision. Our code will be publicly available.

Viaarxiv icon

Accelerating Vision-Language Pretraining with Free Language Modeling

Mar 24, 2023
Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo

Figure 1 for Accelerating Vision-Language Pretraining with Free Language Modeling
Figure 2 for Accelerating Vision-Language Pretraining with Free Language Modeling
Figure 3 for Accelerating Vision-Language Pretraining with Free Language Modeling
Figure 4 for Accelerating Vision-Language Pretraining with Free Language Modeling

The state of the arts in vision-language pretraining (VLP) achieves exemplary performance but suffers from high training costs resulting from slow convergence and long training time, especially on large-scale web datasets. An essential obstacle to training efficiency lies in the entangled prediction rate (percentage of tokens for reconstruction) and corruption rate (percentage of corrupted tokens) in masked language modeling (MLM), that is, a proper corruption rate is achieved at the cost of a large portion of output tokens being excluded from prediction loss. To accelerate the convergence of VLP, we propose a new pretraining task, namely, free language modeling (FLM), that enables a 100% prediction rate with arbitrary corruption rates. FLM successfully frees the prediction rate from the tie-up with the corruption rate while allowing the corruption spans to be customized for each token to be predicted. FLM-trained models are encouraged to learn better and faster given the same GPU time by exploiting bidirectional contexts more flexibly. Extensive experiments show FLM could achieve an impressive 2.5x pretraining time reduction in comparison to the MLM-based methods, while keeping competitive performance on both vision-language understanding and generation tasks. Code will be public at https://github.com/TencentARC/FLM.

* To appear in CVPR 2023 
Viaarxiv icon

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Mar 11, 2023
Teng Wang, Jinrui Zhang, Feng Zheng, Wenhao Jiang, Ran Cheng, Ping Luo

Figure 1 for Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Figure 2 for Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Figure 3 for Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Figure 4 for Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Joint video-language learning has received increasing attention in recent years. However, existing works mainly focus on single or multiple trimmed video clips (events), which makes human-annotated event boundaries necessary during inference. To break away from the ties, we propose a grounded vision-language learning framework for untrimmed videos, which automatically detects informative events and effectively excavates the alignments between multi-sentence descriptions and corresponding event segments. Instead of coarse-level video-language alignments, we present two dual pretext tasks to encourage fine-grained segment-level alignments, i.e., text-to-event grounding (TEG) and event-to-text generation (ETG). TEG learns to adaptively ground the possible event proposals given a set of sentences by estimating the cross-modal distance in a joint semantic space. Meanwhile, ETG aims to reconstruct (generate) the matched texts given event proposals, encouraging the event representation to retain meaningful semantic information. To encourage accurate label assignment between the event set and the text set, we propose a novel semantic-aware cost to mitigate the sub-optimal matching results caused by ambiguous boundary annotations. Our framework is easily extensible to tasks covering visually-grounded language understanding and generation. We achieve state-of-the-art dense video captioning performance on ActivityNet Captions, YouCook2 and YouMakeup, and competitive performance on several other language generation and understanding tasks. Our method also achieved 1st place in both the MTVG and MDVC tasks of the PIC 4th Challenge.

Viaarxiv icon