Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoming Chen

Institute of Marine Biology and Pharmacology, Ocean College, Zhejiang University

FracEvent: Event-Camera Simulation via Fractional-Relaxation Pixel Dynamics

Jun 25, 2026

Langyi Chen, Chuanzhi Xu, Haoxian Zhou, Pengfei Ye, Ziyu Luo, Haodong Chen, Qiang Qu, Xiaoming Chen, Weidong Cai

Abstract:Event cameras asynchronously report brightness changes with microsecond-level temporal resolution, but real event data remain difficult to collect at scale because specialized sensors, careful synchronization, and task-specific annotations are required. Event-camera simulation is therefore important to event-based vision tasks. Most practical simulators build on contrast-threshold event generation, some with additional filtering, stochastic noise, or hand-tuned sensor parameters. While effective, such formulations often simplify the temporal structure produced by the lifecycle of each pixel, which can distort event timing and weaken downstream transfer. We introduce FracEvent, an event simulator that models this pixel-level lifecycle with fractional-relaxation voltage dynamics. Given a log-intensity trajectory, FracEvent drives a compact stack of relaxation modes, combines their responses into a voltage state, emits ON/OFF events by localizing threshold crossings on the continuous voltage trajectory, and updates the reference while retaining the underlying memory modes. This retained state links residual voltage response to later event timing. We evaluate FracEvent through event-stream comparison and downstream transfer on image reconstruction and optical flow estimation. Across multiple datasets, FracEvent improves the temporal structure of generated events and achieves stronger downstream-transfer results than competing simulator baselines, showing its practical value for event-camera simulation.

Via

Access Paper or Ask Questions

DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72

Apr 02, 2026

Wanqian Li, Jintao Peng, Zongfei Jing, Tianyu Zhang, Ze Long, Xianjie Qiao, Xiaoming Chen, Dongxu Yang, Kefeng Duan, June Yang

Abstract:Large language model (LLM) inference increasingly depends on multi-GPU execution, yet existing inference parallelization strategies require layer-wise inter-rank synchronization, making end-to-end performance sensitive to workload imbalance. We present DWDP (Distributed Weight Data Parallelism), an inference parallelization strategy that preserves data-parallel execution while offloading MoE weights across peer GPUs and fetching missing experts on demand. By removing collective inter-rank synchronization, DWDP allows each GPU to progress independently. We further address the practical overheads of this design with two optimizations for split-weight management and asynchronous remote-weight prefetch. Implemented in TensorRT-LLM and evaluated with DeepSeek-R1 on GB200 NVL72, DWDP improves end-to-end output TPS/GPU by 8.8% at comparable TPS/user in the 20-100 TPS/user serving range under 8K input sequence length and 1K output sequence length.

* Technical Report. 17 pages. 8 figures

Via

Access Paper or Ask Questions

SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching

Dec 26, 2025

Xiangwen Zhang, Xiaowei Dai, Runnan Chen, Xiaoming Chen, Zeke Zexi Hu

Abstract:Creating physically realistic content in VR often requires complex modeling tools or predefined 3D models, textures, and animations, which present significant barriers for non-expert users. In this paper, we propose SketchPlay, a novel VR interaction framework that transforms humans' air-drawn sketches and gestures into dynamic, physically realistic scenes, making content creation intuitive and playful like drawing. Specifically, sketches capture the structure and spatial arrangement of objects and scenes, while gestures convey physical cues such as velocity, direction, and force that define movement and behavior. By combining these complementary forms of input, SketchPlay captures both the structure and dynamics of user-created content, enabling the generation of a wide range of complex physical phenomena, such as rigid body motion, elastic deformation, and cloth dynamics. Experimental results demonstrate that, compared to traditional text-driven methods, SketchPlay offers significant advantages in expressiveness, and user experience. By providing an intuitive and engaging creation process, SketchPlay lowers the entry barrier for non-expert users and shows strong potential for applications in education, art, and immersive storytelling.

Via

Access Paper or Ask Questions

Integrated Communication and Remote Sensing in LEO Satellite Systems: Protocol, Architecture and Prototype

Aug 14, 2025

Yichao Xu, Xiaoming Chen, Ming Ying, Zhaoyang Zhang

Abstract:In this paper, we explore the integration of communication and synthetic aperture radar (SAR)-based remote sensing in low Earth orbit (LEO) satellite systems to provide real-time SAR imaging and information transmission. Considering the high-mobility characteristics of satellite channels and limited processing capabilities of satellite payloads, we propose an integrated communication and remote sensing architecture based on an orthogonal delay-Doppler division multiplexing (ODDM) signal waveform. Both communication and SAR imaging functionalities are achieved with an integrated transceiver onboard the LEO satellite, utilizing the same waveform and radio frequency (RF) front-end. Based on such an architecture, we propose a transmission protocol compatible with the 5G NR standard using downlink pilots for joint channel estimation and SAR imaging. Furthermore, we design a unified signal processing framework for the integrated satellite receiver to simultaneously achieve high-performance channel sensing, low-complexity channel equalization and interference-free SAR imaging. Finally, the performance of the proposed integrated system is demonstrated through comprehensive analysis and extensive simulations in the sub-6 GHz band. Moreover, a software-defined radio (SDR) prototype is presented to validate its effectiveness for real-time SAR imaging and information transmission in satellite direct-connect user equipment (UE) scenarios within the millimeter-wave (mmWave) band.

* IEEE Transactions on Wireless Communications, 2025

Via

Access Paper or Ask Questions

Rethinking Multi-User Communication in Semantic Domain: Enhanced OMDMA by Shuffle-Based Orthogonalization and Diffusion Denoising

Jul 28, 2025

Maojun Zhang, Guangxu Zhu, Xiaoming Chen, Kaibin Huang, Zhaoyang Zhang

Figure 1 for Rethinking Multi-User Communication in Semantic Domain: Enhanced OMDMA by Shuffle-Based Orthogonalization and Diffusion Denoising

Figure 2 for Rethinking Multi-User Communication in Semantic Domain: Enhanced OMDMA by Shuffle-Based Orthogonalization and Diffusion Denoising

Figure 3 for Rethinking Multi-User Communication in Semantic Domain: Enhanced OMDMA by Shuffle-Based Orthogonalization and Diffusion Denoising

Figure 4 for Rethinking Multi-User Communication in Semantic Domain: Enhanced OMDMA by Shuffle-Based Orthogonalization and Diffusion Denoising

Abstract:Inter-user interference remains a critical bottleneck in wireless communication systems, particularly in the emerging paradigm of semantic communication (SemCom). Compared to traditional systems, inter-user interference in SemCom severely degrades key semantic information, often causing worse performance than Gaussian noise under the same power level. To address this challenge, inspired by the recently proposed concept of Orthogonal Model Division Multiple Access (OMDMA) that leverages semantic orthogonality rooted in the personalized joint source and channel (JSCC) models to distinguish users, we propose a novel, scalable framework that eliminates the need for user-specific JSCC models as did in original OMDMA. Our key innovation lies in shuffle-based orthogonalization, where randomly permuting the positions of JSCC feature vectors transforms inter-user interference into Gaussian-like noise. By assigning each user a unique shuffling pattern, the interference is treated as channel noise, enabling effective mitigation using diffusion models (DMs). This approach not only simplifies system design by requiring a single universal JSCC model but also enhances privacy, as shuffling patterns act as implicit private keys. Additionally, we extend the framework to scenarios involving semantically correlated data. By grouping users based on semantic similarity, a cooperative beamforming strategy is introduced to exploit redundancy in correlated data, further improving system performance. Extensive simulations demonstrate that the proposed method outperforms state-of-the-art multi-user SemCom frameworks, achieving superior semantic fidelity, robustness to interference, and scalability-all without requiring additional training overhead.

* 16 pages

Via

Access Paper or Ask Questions

Robust Deep Learning-Based Physical Layer Communications: Strategies and Approaches

May 02, 2025

Fenghao Zhu, Xinquan Wang, Chen Zhu, Tierui Gong, Zhaohui Yang, Chongwen Huang, Xiaoming Chen, Zhaoyang Zhang, Mérouane Debbah

Abstract:Deep learning (DL) has emerged as a transformative technology with immense potential to reshape the sixth-generation (6G) wireless communication network. By utilizing advanced algorithms for feature extraction and pattern recognition, DL provides unprecedented capabilities in optimizing the network efficiency and performance, particularly in physical layer communications. Although DL technologies present the great potential, they also face significant challenges related to the robustness, which are expected to intensify in the complex and demanding 6G environment. Specifically, current DL models typically exhibit substantial performance degradation in dynamic environments with time-varying channels, interference of noise and different scenarios, which affect their effectiveness in diverse real-world applications. This paper provides a comprehensive overview of strategies and approaches for robust DL-based methods in physical layer communications. First we introduce the key challenges that current DL models face. Then we delve into a detailed examination of DL approaches specifically tailored to enhance robustness in 6G, which are classified into data-driven and model-driven strategies. Finally, we verify the effectiveness of these methods by case studies and outline future research directions.

* 8 pages, 3 figures. Accept by IEEE Network Magazine

Via

Access Paper or Ask Questions

Wireless Large AI Model: Shaping the AI-Native Future of 6G and Beyond

Apr 20, 2025

Fenghao Zhu, Xinquan Wang, Xinyi Li, Maojun Zhang, Yixuan Chen, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Zhaoyang Zhang, Richeng Jin(+13 more)

Abstract:The emergence of sixth-generation and beyond communication systems is expected to fundamentally transform digital experiences through introducing unparalleled levels of intelligence, efficiency, and connectivity. A promising technology poised to enable this revolutionary vision is the wireless large AI model (WLAM), characterized by its exceptional capabilities in data processing, inference, and decision-making. In light of these remarkable capabilities, this paper provides a comprehensive survey of WLAM, elucidating its fundamental principles, diverse applications, critical challenges, and future research opportunities. We begin by introducing the background of WLAM and analyzing the key synergies with wireless networks, emphasizing the mutual benefits. Subsequently, we explore the foundational characteristics of WLAM, delving into their unique relevance in wireless environments. Then, the role of WLAM in optimizing wireless communication systems across various use cases and the reciprocal benefits are systematically investigated. Furthermore, we discuss the integration of WLAM with emerging technologies, highlighting their potential to enable transformative capabilities and breakthroughs in wireless communication. Finally, we thoroughly examine the high-level challenges hindering the practical implementation of WLAM and discuss pivotal future research directions.

Via

Access Paper or Ask Questions

AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports

Mar 26, 2025

Xiangwen Zhang, Qian Zhang, Longfei Han, Qiang Qu, Xiaoming Chen

Figure 1 for AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports

Figure 2 for AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports

Figure 3 for AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports

Figure 4 for AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports

Abstract:Collecting real-world vehicle accident videos for autonomous driving research is challenging due to their rarity and complexity. While existing driving video generation methods may produce visually realistic videos, they often fail to deliver physically realistic simulations because they lack the capability to generate accurate post-collision trajectories. In this paper, we introduce AccidentSim, a novel framework that generates physically realistic vehicle collision videos by extracting and utilizing the physical clues and contextual information available in real-world vehicle accident reports. Specifically, AccidentSim leverages a reliable physical simulator to replicate post-collision vehicle trajectories from the physical and contextual information in the accident reports and to build a vehicle collision trajectory dataset. This dataset is then used to fine-tune a language model, enabling it to respond to user prompts and predict physically consistent post-collision trajectories across various driving scenarios based on user descriptions. Finally, we employ Neural Radiance Fields (NeRF) to render high-quality backgrounds, merging them with the foreground vehicles that exhibit physically realistic trajectories to generate vehicle collision videos. Experimental results demonstrate that the videos produced by AccidentSim excel in both visual and physical authenticity.

Via

Access Paper or Ask Questions

EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Mar 24, 2025

Qiang Qu, Ming Li, Xiaoming Chen, Tongliang Liu

Figure 1 for EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Figure 2 for EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Figure 3 for EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Figure 4 for EvAnimate: Event-conditioned Image-to-Video Generation for Human Animation

Abstract:Conditional human animation transforms a static reference image into a dynamic sequence by applying motion cues such as poses. These motion cues are typically derived from video data but are susceptible to limitations including low temporal resolution, motion blur, overexposure, and inaccuracies under low-light conditions. In contrast, event cameras provide data streams with exceptionally high temporal resolution, a wide dynamic range, and inherent resistance to motion blur and exposure issues. In this work, we propose EvAnimate, a framework that leverages event streams as motion cues to animate static human images. Our approach employs a specialized event representation that transforms asynchronous event streams into 3-channel slices with controllable slicing rates and appropriate slice density, ensuring compatibility with diffusion models. Subsequently, a dual-branch architecture generates high-quality videos by harnessing the inherent motion dynamics of the event streams, thereby enhancing both video quality and temporal consistency. Specialized data augmentation strategies further enhance cross-person generalization. Finally, we establish a new benchmarking, including simulated event data for training and validation, and a real-world event dataset capturing human actions under normal and extreme scenarios. The experiment results demonstrate that EvAnimate achieves high temporal fidelity and robust performance in scenarios where traditional video-derived cues fall short.

Via

Access Paper or Ask Questions

LLM-EvRep: Learning an LLM-Compatible Event Representation Using a Self-Supervised Framework

Feb 20, 2025

Zongyou Yu, Qiang Qu, Qian Zhang, Nan Zhang, Xiaoming Chen

Abstract:Recent advancements in event-based recognition have demonstrated significant promise, yet most existing approaches rely on extensive training, limiting their adaptability for efficient processing of event-driven visual content. Meanwhile, large language models (LLMs) have exhibited remarkable zero-shot capabilities across diverse domains, but their application to event-based visual recognition remains largely unexplored. To bridge this gap, we propose \textbf{LLM-EvGen}, an event representation generator that produces LLM-compatible event representations \textbf{LLM-EvRep}, thereby enhancing the performance of LLMs on event recognition tasks. The generator is trained using a self-supervised framework, aligning the generated representations with semantic consistency and structural fidelity. Comprehensive experiments were conducted on three datasets: N-ImageNet, N-Caltech101, and N-MNIST. The results demonstrate that our method, \textbf{LLM-EvRep}, outperforms the event-to-video method, E2VID, by 15.93\%, 0.82\%, and 50.21\%, respectively, in recognition tasks when evaluated using GPT-4o.

* 6 pages, 2 figures,Companion Proceedings of the ACM Web Conference 2025 (WWW Companion '25)

Via

Access Paper or Ask Questions