Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chao Lu

MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors

Jul 30, 2025

Shouyi Lu, Zihan Lin, Chao Lu, Huanran Wang, Guirong Zhuo, Lianqing Zheng

Abstract:Autonomous driving systems rely heavily on multimodal perception data to understand complex environments. However, the long-tailed distribution of real-world data hinders generalization, especially for rare but safety-critical vehicle categories. To address this challenge, we propose MultiEditor, a dual-branch latent diffusion framework designed to edit images and LiDAR point clouds in driving scenarios jointly. At the core of our approach is introducing 3D Gaussian Splatting (3DGS) as a structural and appearance prior for target objects. Leveraging this prior, we design a multi-level appearance control mechanism--comprising pixel-level pasting, semantic-level guidance, and multi-branch refinement--to achieve high-fidelity reconstruction across modalities. We further propose a depth-guided deformable cross-modality condition module that adaptively enables mutual guidance between modalities using 3DGS-rendered depth, significantly enhancing cross-modality consistency. Extensive experiments demonstrate that MultiEditor achieves superior performance in visual and geometric fidelity, editing controllability, and cross-modality consistency. Furthermore, generating rare-category vehicle data with MultiEditor substantially enhances the detection accuracy of perception models on underrepresented classes.

Via

Access Paper or Ask Questions

The Fourth Monocular Depth Estimation Challenge

Apr 24, 2025

Anton Obukhov, Matteo Poggi, Fabio Tosi, Ripudaman Singh Arora, Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden, Shuaihang Wang, Zhenxin Ma(+47 more)

Abstract:This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and affine-invariant predictions. We also revised the baselines and included popular off-the-shelf methods: Depth Anything v2 and Marigold. The challenge received a total of 24 submissions that outperformed the baselines on the test set; 10 of these included a report describing their approach, with most leading methods relying on affine-invariant predictions. The challenge winners improved the 3D F-Score over the previous edition's best result, raising it from 22.58% to 23.05%.

* To appear in CVPRW2025

Via

Access Paper or Ask Questions

Cross-cultural Deployment of Autonomous Vehicles Using Data-light Inverse Reinforcement Learning

Apr 15, 2025

Hongliang Lu, Shuqi Shen, Junjie Yang, Chao Lu, Xinhu Zheng, Hai Yang

Abstract:More than the adherence to specific traffic regulations, driving culture touches upon a more implicit part - an informal, conventional, collective behavioral pattern followed by drivers - that varies across countries, regions, and even cities. Such cultural divergence has become one of the biggest challenges in deploying autonomous vehicles (AVs) across diverse regions today. The current emergence of data-driven methods has shown a potential solution to enable culture-compatible driving through learning from data, but what if some underdeveloped regions cannot provide sufficient local data to inform driving culture? This issue is particularly significant for a broader global AV market. Here, we propose a cross-cultural deployment scheme for AVs, called data-light inverse reinforcement learning, designed to re-calibrate culture-specific AVs and assimilate them into other cultures. First, we report the divergence in driving cultures through a comprehensive comparative analysis of naturalistic driving datasets on highways from three countries: Germany, China, and the USA. Then, we demonstrate the effectiveness of our scheme by testing the expeditious cross-cultural deployment across these three countries, with cumulative testing mileage of over 56084 km. The performance is particularly advantageous when cross-cultural deployment is carried out without affluent local data. Results show that we can reduce the dependence on local data by a margin of 98.67% at best. This study is expected to bring a broader, fairer AV global market, particularly in those regions that lack enough local data to develop culture-compatible AVs.

Via

Access Paper or Ask Questions

Targetless Intrinsics and Extrinsic Calibration of Multiple LiDARs and Cameras with IMU using Continuous-Time Estimation

Jan 06, 2025

Yuezhang Lv, Yunzhou Zhang, Chao Lu, Jiajun Zhu, Song Wu

Figure 1 for Targetless Intrinsics and Extrinsic Calibration of Multiple LiDARs and Cameras with IMU using Continuous-Time Estimation

Figure 2 for Targetless Intrinsics and Extrinsic Calibration of Multiple LiDARs and Cameras with IMU using Continuous-Time Estimation

Figure 3 for Targetless Intrinsics and Extrinsic Calibration of Multiple LiDARs and Cameras with IMU using Continuous-Time Estimation

Figure 4 for Targetless Intrinsics and Extrinsic Calibration of Multiple LiDARs and Cameras with IMU using Continuous-Time Estimation

Abstract:Accurate spatiotemporal calibration is a prerequisite for multisensor fusion. However, sensors are typically asynchronous, and there is no overlap between the fields of view of cameras and LiDARs, posing challenges for intrinsic and extrinsic parameter calibration. To address this, we propose a calibration pipeline based on continuous-time and bundle adjustment (BA) capable of simultaneous intrinsic and extrinsic calibration (6 DOF transformation and time offset). We do not require overlapping fields of view or any calibration board. Firstly, we establish data associations between cameras using Structure from Motion (SFM) and perform self-calibration of camera intrinsics. Then, we establish data associations between LiDARs through adaptive voxel map construction, optimizing for extrinsic calibration within the map. Finally, by matching features between the intensity projection of LiDAR maps and camera images, we conduct joint optimization for intrinsic and extrinsic parameters. This pipeline functions in texture-rich structured environments, allowing simultaneous calibration of any number of cameras and LiDARs without the need for intricate sensor synchronization triggers. Experimental results demonstrate our method's ability to fulfill co-visibility and motion constraints between sensors without accumulating errors.

Via

Access Paper or Ask Questions

Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages

Sep 27, 2024

Feige Liu, Xin Li, Chao Lu, Wenying Gong

Figure 1 for Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages

Figure 2 for Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages

Figure 3 for Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages

Figure 4 for Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages

Abstract:Parallel batch processing machines have extensive applications in the semiconductor manufacturing process. However, the problem models in previous studies regard parallel batch processing as a fixed processing stage in the machining process. This study generalizes the problem model, in which users can arbitrarily set certain stages as parallel batch processing stages according to their needs. A Hybrid Flow Shop Scheduling Problem with Parallel Batch Processing Machines (PBHFSP) is solved in this paper. Furthermore, an Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm (AMOEA/D) is designed to simultaneously optimize both makespan and Total Energy Consumption (TEC). Firstly, a hybrid initialization strategy with heuristic rules based on knowledge of PBHFSP is proposed to generate promising solutions. Secondly, the disjunctive graph model has been established based on the knowledge to find the critical-path of PBHFS. Then, a critical-path based neighborhood search is proposed to enhance the exploitation ability of AMOEA/D. Moreover, the search time is adaptively adjusted based on learning experience from Q-learning and Decay Law. Afterward, to enhance the exploration capability of the algorithm, AMOEA/D designs an improved population updating strategy with a weight vector updating strategy. These strategies rematch individuals with weight vectors, thereby maintaining the diversity of the population. Finally, the proposed algorithm is compared with state-of-the-art algorithms. The experimental results show that the AMOEA/D is superior to the comparison algorithms in solving the PBHFSP.

* 12 pages

Via

Access Paper or Ask Questions

A History-Guided Regional Partitioning Evolutionary Optimization for Solving the Flexible Job Shop Problem with Limited Multi-load Automated Guided Vehicles

Sep 27, 2024

Feige Liu, Chao Lu, Xin Li

Abstract:In a flexible job shop environment, using Automated Guided Vehicles (AGVs) to transport jobs and process materials is an important way to promote the intelligence of the workshop. Compared with single-load AGVs, multi-load AGVs can improve AGV utilization, reduce path conflicts, etc. Therefore, this study proposes a history-guided regional partitioning algorithm (HRPEO) for the flexible job shop scheduling problem with limited multi-load AGVs (FJSPMA). First, the encoding and decoding rules are designed according to the characteristics of multi-load AGVs, and then the initialization rule based on the branch and bound method is used to generate the initial population. Second, to prevent the algorithm from falling into a local optimum, the algorithm adopts a regional partitioning strategy. This strategy divides the solution space into multiple regions and measures the potential of the regions. After that, cluster the regions into multiple clusters in each iteration, and selects individuals for evolutionary search based on the set of clusters. Third, a local search strategy is designed to improve the exploitation ability of the algorithm, which uses a greedy approach to optimize machines selection and transportation sequence according to the characteristics of FJSPMA. Finally, a large number of experiments are carried out on the benchmarks to test the performance of the algorithm. Compared with multiple advanced algorithms, the results show that the HRPEO has a better advantage in solving FJSPMA.

* 14 pages

Via

Access Paper or Ask Questions

Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing

Mar 18, 2024

Wei Wang, Dongdong Zou, Zhenpeng Wu, Qi Sui, Xingwen Yi, Fan Li, Chao Lu, Zhaohui Li

Figure 1 for Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing

Figure 2 for Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing

Figure 3 for Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing

Figure 4 for Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing

Abstract:Coherent technology inherent with more availabledegrees of freedom is deemed a competitive solution for nextgeneration ultra-high-speed short-reach optical interconnects.However, the fatal barriers to implementing the conventiona.coherent system in short-reach optical interconnect are the costfootprint, and power consumption. Self-homodyne coherentsystem exhibits its potential to reduce the power consumption ofthe receiver-side digital signal processing (Rx-DSP) by deliveringthe local oscillator (LO) from the transmitter. However, anautomatic polarization controller (APC) is inevitable in the remoteLO link to avoid polarization fading, resulting in additional costsTo address the polarization fading issue, a simplified self.homodyne coherent system is proposed enabled by Alamouticoding in this paper. Benefiting from the Alamouti coding betweentwo polarizations, a polarization-insensitive receiver onlyincluding a 3dB coupler, a 90o Hybrid, and two balancedphotodiodes (BPDs)is sufficient for reception. Meanwhile, theAPC in the LO link is needless, simplifying the receiver structuresignificantly. Besides, the digital subcarrier multiplexing (DSCM)technique is also adopted to relax the computational complexity ofthe chromatic dispersion compensation (CDC), which is one of thedominant power consumption modules in Rx-DSP. Thetransmission performance of 50Gbaud 4-subcarrier 16/32OAM(4SC-16/320AM) DSCM signal based on the proposed simplifiedself-homodyne coherent system is investigated experimentallyThe results show that the bit-error-ratio(BER) performancedegradation caused by CD can be solved by increasing 4 taps inthe equalizer for 80km single mode fiber(SMF)transmissionwithout individual CDC, which operates in a low-complexitymanner.

Via

Access Paper or Ask Questions

Compressed domain vibration detection and classification for distributed acoustic sensing

Dec 27, 2022

Xingliang Shen, Huan Wu, Kun Zhu, Yujia Li, Hua Zheng, Jialong Li, Liyang Shao, Perry Ping Shum, Chao Lu

Figure 1 for Compressed domain vibration detection and classification for distributed acoustic sensing

Figure 2 for Compressed domain vibration detection and classification for distributed acoustic sensing

Figure 3 for Compressed domain vibration detection and classification for distributed acoustic sensing

Figure 4 for Compressed domain vibration detection and classification for distributed acoustic sensing

Abstract:Distributed acoustic sensing (DAS) is a novel enabling technology that can turn existing fibre optic networks to distributed acoustic sensors. However, it faces the challenges of transmitting, storing, and processing massive streams of data which are orders of magnitude larger than that collected from point sensors. The gap between intensive data generated by DAS and modern computing system with limited reading/writing speed and storage capacity imposes restrictions on many applications. Compressive sensing (CS) is a revolutionary signal acquisition method that allows a signal to be acquired and reconstructed with significantly fewer samples than that required by Nyquist-Shannon theorem. Though the data size is greatly reduced in the sampling stage, the reconstruction of the compressed data is however time and computation consuming. To address this challenge, we propose to map the feature extractor from Nyquist-domain to compressed-domain and therefore vibration detection and classification can be directly implemented in compressed-domain. The measured results show that our framework can be used to reduce the transmitted data size by 70% while achieves 99.4% true positive rate (TPR) and 0.04% false positive rate (TPR) along 5 km sensing fibre and 95.05% classification accuracy on a 5-class classification task.

Via

Access Paper or Ask Questions

Continual Interactive Behavior Learning With Traffic Divergence Measurement: A Dynamic Gradient Scenario Memory Approach

Dec 21, 2022

Yunlong Lin, Zirui Li, Cheng Gong, Chao Lu, Xinwei Wang, Jianwei Gong

Figure 1 for Continual Interactive Behavior Learning With Traffic Divergence Measurement: A Dynamic Gradient Scenario Memory Approach

Figure 2 for Continual Interactive Behavior Learning With Traffic Divergence Measurement: A Dynamic Gradient Scenario Memory Approach

Figure 3 for Continual Interactive Behavior Learning With Traffic Divergence Measurement: A Dynamic Gradient Scenario Memory Approach

Figure 4 for Continual Interactive Behavior Learning With Traffic Divergence Measurement: A Dynamic Gradient Scenario Memory Approach

Abstract:Developing autonomous vehicles (AVs) helps improve the road safety and traffic efficiency of intelligent transportation systems (ITS). Accurately predicting the trajectories of traffic participants is essential to the decision-making and motion planning of AVs in interactive scenarios. Recently, learning-based trajectory predictors have shown state-of-the-art performance in highway or urban areas. However, most existing learning-based models trained with fixed datasets may perform poorly in continuously changing scenarios. Specifically, they may not perform well in learned scenarios after learning the new one. This phenomenon is called "catastrophic forgetting". Few studies investigate trajectory predictions in continuous scenarios, where catastrophic forgetting may happen. To handle this problem, first, a novel continual learning (CL) approach for vehicle trajectory prediction is proposed in this paper. Then, inspired by brain science, a dynamic memory mechanism is developed by utilizing the measurement of traffic divergence between scenarios, which balances the performance and training efficiency of the proposed CL approach. Finally, datasets collected from different locations are used to design continual training and testing methods in experiments. Experimental results show that the proposed approach achieves consistently high prediction accuracy in continuous scenarios without re-training, which mitigates catastrophic forgetting compared to non-CL approaches. The implementation of the proposed approach is publicly available at https://github.com/BIT-Jack/D-GSM

Via

Access Paper or Ask Questions

Leveraging Multi-stream Information Fusion for Trajectory Prediction in Low-illumination Scenarios: A Multi-channel Graph Convolutional Approach

Nov 18, 2022

Hailong Gong, Zirui Li, Chao Lu, Guodong Du, Jianwei Gong

Figure 1 for Leveraging Multi-stream Information Fusion for Trajectory Prediction in Low-illumination Scenarios: A Multi-channel Graph Convolutional Approach

Figure 2 for Leveraging Multi-stream Information Fusion for Trajectory Prediction in Low-illumination Scenarios: A Multi-channel Graph Convolutional Approach

Figure 3 for Leveraging Multi-stream Information Fusion for Trajectory Prediction in Low-illumination Scenarios: A Multi-channel Graph Convolutional Approach

Figure 4 for Leveraging Multi-stream Information Fusion for Trajectory Prediction in Low-illumination Scenarios: A Multi-channel Graph Convolutional Approach

Abstract:Trajectory prediction is a fundamental problem and challenge for autonomous vehicles. Early works mainly focused on designing complicated architectures for deep-learning-based prediction models in normal-illumination environments, which fail in dealing with low-light conditions. This paper proposes a novel approach for trajectory prediction in low-illumination scenarios by leveraging multi-stream information fusion, which flexibly integrates image, optical flow, and object trajectory information. The image channel employs Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) networks to extract temporal information from the camera. The optical flow channel is applied to capture the pattern of relative motion between adjacent camera frames and modelled by Spatial-Temporal Graph Convolutional Network (ST-GCN). The trajectory channel is used to recognize high-level interactions between vehicles. Finally, information from all the three channels is effectively fused in the prediction module to generate future trajectories of surrounding vehicles in low-illumination conditions. The proposed multi-channel graph convolutional approach is validated on HEV-I and newly generated Dark-HEV-I, egocentric vision datasets that primarily focus on urban intersection scenarios. The results demonstrate that our method outperforms the baselines, in standard and low-illumination scenarios. Additionally, our approach is generic and applicable to scenarios with different types of perception data. The source code of the proposed approach is available at https://github.com/TommyGong08/MSIF}{https://github.com/TommyGong08/MSIF.

Via

Access Paper or Ask Questions