Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junyang Zhang

MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models

Apr 16, 2025

Junyang Zhang, Tianyi Zhu, Cheng Luo, Anima Anandkumar

Abstract:Long-context language models exhibit impressive performance but remain challenging to deploy due to high GPU memory demands during inference. We propose Memory-efficient Offloaded Mini-sequence Inference (MOM), a method that partitions critical layers into smaller "mini-sequences" and integrates seamlessly with KV cache offloading. Experiments on various Llama, Qwen, and Mistral models demonstrate that MOM reduces peak memory usage by over 50\% on average. On Meta-Llama-3.2-8B, MOM extends the maximum context length from 155k to 455k tokens on a single A100 80GB GPU, while keeping outputs identical and not compromising accuracy. MOM also maintains highly competitive throughput due to minimal computational overhead and efficient last-layer processing. Compared to traditional chunked prefill methods, MOM achieves a 35\% greater context length extension. More importantly, our method drastically reduces prefill memory consumption, eliminating it as the longstanding dominant memory bottleneck during inference. This breakthrough fundamentally changes research priorities, redirecting future efforts from prefill-stage optimizations to improving decode-stage residual KV cache efficiency.

* Submitted to COLM

Via

Access Paper or Ask Questions

A PPO-based DRL Auto-Tuning Nonlinear PID Drone Controller for Robust Autonomous Flights

Mar 30, 2024

Junyang Zhang, Cristian Emanuel Ocampo Rivera, Kyle Tyni, Steven Nguyen

Figure 1 for A PPO-based DRL Auto-Tuning Nonlinear PID Drone Controller for Robust Autonomous Flights

Figure 2 for A PPO-based DRL Auto-Tuning Nonlinear PID Drone Controller for Robust Autonomous Flights

Figure 3 for A PPO-based DRL Auto-Tuning Nonlinear PID Drone Controller for Robust Autonomous Flights

Figure 4 for A PPO-based DRL Auto-Tuning Nonlinear PID Drone Controller for Robust Autonomous Flights

Abstract:This project aims to revolutionize drone flight control by implementing a nonlinear Deep Reinforcement Learning (DRL) agent as a replacement for traditional linear Proportional Integral Derivative (PID) controllers. The primary objective is to seamlessly transition drones between manual and autonomous modes, enhancing responsiveness and stability. We utilize the Proximal Policy Optimization (PPO) reinforcement learning strategy within the Gazebo simulator to train the DRL agent. Adding a $20,000 indoor Vicon tracking system offers <1mm positioning accuracy, which significantly improves autonomous flight precision. To navigate the drone in the shortest collision-free trajectory, we also build a 3 dimensional A* path planner and implement it into the real flight successfully.

* 9 pages, 12 figures

Via

Access Paper or Ask Questions

Towards a High-Performance Object Detector: Insights from Drone Detection Using ViT and CNN-based Deep Learning Models

Aug 19, 2023

Junyang Zhang

Figure 1 for Towards a High-Performance Object Detector: Insights from Drone Detection Using ViT and CNN-based Deep Learning Models

Figure 2 for Towards a High-Performance Object Detector: Insights from Drone Detection Using ViT and CNN-based Deep Learning Models

Figure 3 for Towards a High-Performance Object Detector: Insights from Drone Detection Using ViT and CNN-based Deep Learning Models

Figure 4 for Towards a High-Performance Object Detector: Insights from Drone Detection Using ViT and CNN-based Deep Learning Models

Abstract:Accurate drone detection is strongly desired in drone collision avoidance, drone defense and autonomous Unmanned Aerial Vehicle (UAV) self-landing. With the recent emergence of the Vision Transformer (ViT), this critical task is reassessed in this paper using a UAV dataset composed of 1359 drone photos. We construct various CNN and ViT-based models, demonstrating that for single-drone detection, a basic ViT can achieve performance 4.6 times more robust than our best CNN-based transfer learning models. By implementing the state-of-the-art You Only Look Once (YOLO v7, 200 epochs) and the experimental ViT-based You Only Look At One Sequence (YOLOS, 20 epochs) in multi-drone detection, we attain impressive 98% and 96% mAP values, respectively. We find that ViT outperforms CNN at the same epoch, but also requires more training data, computational power, and sophisticated, performance-oriented designs to fully surpass the capabilities of cutting-edge CNN detectors. We summarize the distinct characteristics of ViT and CNN models to aid future researchers in developing more efficient deep learning models.

* 7 pages, 23 figures, IEEE Xplore, 2023 International Conference on Computer Vision and Robotics Science

Via

Access Paper or Ask Questions

Digital twin brain: a bridge between biological intelligence and artificial intelligence

Aug 03, 2023

Hui Xiong, Congying Chu, Lingzhong Fan, Ming Song, Jiaqi Zhang, Yawei Ma, Ruonan Zheng, Junyang Zhang, Zhengyi Yang, Tianzi Jiang

Abstract:In recent years, advances in neuroscience and artificial intelligence have paved the way for unprecedented opportunities for understanding the complexity of the brain and its emulation by computational systems. Cutting-edge advancements in neuroscience research have revealed the intricate relationship between brain structure and function, while the success of artificial neural networks highlights the importance of network architecture. Now is the time to bring them together to better unravel how intelligence emerges from the brain's multiscale repositories. In this review, we propose the Digital Twin Brain (DTB) as a transformative platform that bridges the gap between biological and artificial intelligence. It consists of three core elements: the brain structure that is fundamental to the twinning process, bottom-layer models to generate brain functions, and its wide spectrum of applications. Crucially, brain atlases provide a vital constraint, preserving the brain's network organization within the DTB. Furthermore, we highlight open questions that invite joint efforts from interdisciplinary fields and emphasize the far-reaching implications of the DTB. The DTB can offer unprecedented insights into the emergence of intelligence and neurological disorders, which holds tremendous promise for advancing our understanding of both biological and artificial intelligence, and ultimately propelling the development of artificial general intelligence and facilitating precision mental healthcare.

Via

Access Paper or Ask Questions

Prioritized Trace Selection: Towards High-Performance DRL-based Network Controllers

Feb 24, 2023

Sagar Patel, Junyang Zhang, Sangeetha Abdu Jyothi, Nina Narodytska

Figure 1 for Prioritized Trace Selection: Towards High-Performance DRL-based Network Controllers

Figure 2 for Prioritized Trace Selection: Towards High-Performance DRL-based Network Controllers

Figure 3 for Prioritized Trace Selection: Towards High-Performance DRL-based Network Controllers

Figure 4 for Prioritized Trace Selection: Towards High-Performance DRL-based Network Controllers

Abstract:Deep Reinforcement Learning (DRL) based controllers offer high performance in a variety of network environments. However, simulator-based training of DRL controllers using highly skewed datasets of real-world traces often results in poor performance in the wild. In this paper, we put forward a generalizable solution for training high-performance DRL controllers in simulators -- Prioritized Trace Selection (PTS). PTS employs an automated three-stage process. First, we identify critical features that determine trace behavior. Second, we classify the traces into clusters. Finally, we dynamically identify and prioritize the salient clusters during training. PTS does not require any changes to the DRL workflow. It can work across both on-policy and off-policy DRL algorithms. We use Adaptive Bit Rate selection and Congestion Control as representative applications to show that PTS offers better performance in simulation and real-world, across multiple controllers and DRL algorithms. Our novel ABR controller, Gelato, trained with PTS outperforms state-of-the-art controllers on the real-world live-streaming platform, Puffer, reducing stalls by 59% and significantly improving average video quality.

Via

Access Paper or Ask Questions