Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Zhang

Henry

Batch Speculative Decoding Done Right

Oct 26, 2025

Ranran Haoran Zhang, Soumik Dey, Ashirbad Mishra, Hansi Wu, Binbin Li, Rui Zhang

Figure 1 for Batch Speculative Decoding Done Right

Figure 2 for Batch Speculative Decoding Done Right

Figure 3 for Batch Speculative Decoding Done Right

Figure 4 for Batch Speculative Decoding Done Right

Abstract:Speculative decoding speeds up LLM inference by using a small draft model to propose multiple tokens that a target model verifies in parallel. Extending this idea to batches is essential for production serving, but it introduces the ragged tensor problem: sequences in the same batch accept different numbers of draft tokens, breaking right-alignment and corrupting position IDs, attention masks, and KV-cache state. We show that several existing batch implementations violate output equivalence-the fundamental requirement that speculative decoding must produce identical token sequences to standard autoregressive generation. These violations occur precisely due to improper handling of the ragged tensor problem. In response, we (1) characterize the synchronization requirements that guarantee correctness, (2) present a correctness-first batch speculative decoding EQSPEC that exposes realignment as consuming 40% of overhead, and (3) introduce EXSPEC, which maintains a sliding pool of sequences and dynamically forms same-length groups, to reduce the realignment overhead while preserving per-sequence speculative speedups. On the SpecBench dataset, across Vicuna-7B/68M, Qwen3-8B/0.6B, and GLM-4-9B/0.6B target/draft pairs, our approach achieves up to 3$\times$ throughput improvement at batch size 8 compared to batch size 1, with efficient scaling through batch size 8, while maintaining 95% output equivalence. Our method requires no custom kernels and integrates cleanly with existing inference stacks. Our code is available at https://github.com/eBay/spec_dec.

Via

Access Paper or Ask Questions

QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation

Oct 22, 2025

Yang Zhang, Rui Zhang, Jiaming Guo, Lei Huang, Di Huang, Yunpu Zhao, Shuyao Cheng, Pengwei Jin, Chongxiao Li, Zidong Du(+3 more)

Abstract:The remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on Reinforcement Learning (RL) for producing functionally correct Verilog code. In this paper, we propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV) by leveraging code segments of functionally correct output signal to optimize RL training. Considering Verilog code specifies the structural interconnection of hardware gates and wires so that different output signals are independent, the key insight of QiMeng-SALV is to extract verified signal-aware implementations in partially incorrect modules, so as to enhance the extraction of meaningful functional rewards. Roughly, we verify the functional correctness of signals in generated module by comparing with that of reference module in the training data. Then abstract syntax tree (AST) is employed to identify signal-aware code segments which can provide meaningful functional rewards from erroneous modules. Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments, thereby preventing noise and interference from incorrect signals. The proposed QiMeng-SALV underscores the paradigm shift from conventional module-level to fine-grained signal-level optimization in Verilog code generation, addressing the issue of insufficient functional rewards. Experiments demonstrate that our method achieves state-of-the-art performance on VerilogEval and RTLLM, with a 7B parameter model matching the performance of the DeepSeek v3 671B model and significantly outperforming the leading open-source model CodeV trained on the same dataset. Our code is available at https://github.com/zy1xxx/SALV.

* Accepted to NeurIPS 2025

Via

Access Paper or Ask Questions

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Sep 26, 2025

Junbo Niu, Zheng Liu, Zhuangcheng Gu, Bin Wang, Linke Ouyang, Zhiyuan Zhao, Tao Chu, Tianyao He, Fan Wu, Qintong Zhang(+51 more)

Abstract:We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition. In the first stage, the model performs efficient layout analysis on downsampled images to identify structural elements, circumventing the computational overhead of processing high-resolution inputs. In the second stage, guided by the global layout, it performs targeted content recognition on native-resolution crops extracted from the original image, preserving fine-grained details in dense text, complex formulas, and tables. To support this strategy, we developed a comprehensive data engine that generates diverse, large-scale training corpora for both pretraining and fine-tuning. Ultimately, MinerU2.5 demonstrates strong document parsing ability, achieving state-of-the-art performance on multiple benchmarks, surpassing both general-purpose and domain-specific models across various recognition tasks, while maintaining significantly lower computational overhead.

* Technical Report; GitHub Repo: https://github.com/opendatalab/MinerU; Hugging Face Model: https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B; Hugging Face Demo: https://huggingface.co/spaces/opendatalab/MinerU

Via

Access Paper or Ask Questions

Movable-Antenna Trajectory Optimization for Wireless Sensing: CRB Scaling Laws over Time and Space

Sep 18, 2025

Wenyan Ma, Lipeng Zhu, Rui Zhang

Figure 1 for Movable-Antenna Trajectory Optimization for Wireless Sensing: CRB Scaling Laws over Time and Space

Figure 2 for Movable-Antenna Trajectory Optimization for Wireless Sensing: CRB Scaling Laws over Time and Space

Figure 3 for Movable-Antenna Trajectory Optimization for Wireless Sensing: CRB Scaling Laws over Time and Space

Figure 4 for Movable-Antenna Trajectory Optimization for Wireless Sensing: CRB Scaling Laws over Time and Space

Abstract:In this paper, we present a new wireless sensing system utilizing a movable antenna (MA) that continuously moves and receives sensing signals to enhance sensing performance over the conventional fixed-position antenna (FPA) sensing. We show that the angle estimation performance is fundamentally determined by the MA trajectory, and derive the Cramer-Rao bound (CRB) of the mean square error (MSE) for angle-of-arrival (AoA) estimation as a function of the trajectory for both one-dimensional (1D) and two-dimensional (2D) antenna movement. For the 1D case, a globally optimal trajectory that minimizes the CRB is derived in closed form. Notably, the resulting CRB decreases cubically with sensing time in the time-constrained regime, whereas it decreases linearly with sensing time and quadratically with the movement line segment's length in the space-constrained regime. For the 2D case, we aim to achieve the minimum of maximum (min-max) CRBs of estimation MSE for the two AoAs with respect to the horizontal and vertical axes. To this end, we design an efficient alternating optimization algorithm that iteratively updates the MA's horizontal or vertical coordinates with the other being fixed, yielding a locally optimal trajectory. Numerical results show that the proposed 1D/2D MA-based sensing schemes significantly reduce both the CRB and actual AoA estimation MSE compared to conventional FPA-based sensing with uniform linear/planar arrays (ULAs/UPAs) as well as various benchmark MA trajectories. Moreover, it is revealed that the steering vectors of our designed 1D/2D MA trajectories have low correlation in the angular domain, thereby effectively increasing the angular resolution for achieving higher AoA estimation accuracy.

Via

Access Paper or Ask Questions

Two-Stage Decoupling Framework for Variable-Length Glaucoma Prognosis

Sep 15, 2025

Yiran Song, Yikai Zhang, Silvia Orengo-Nania, Nian Wang, Fenglong Ma, Rui Zhang, Yifan Peng, Mingquan Lin

Abstract:Glaucoma is one of the leading causes of irreversible blindness worldwide. Glaucoma prognosis is essential for identifying at-risk patients and enabling timely intervention to prevent blindness. Many existing approaches rely on historical sequential data but are constrained by fixed-length inputs, limiting their flexibility. Additionally, traditional glaucoma prognosis methods often employ end-to-end models, which struggle with the limited size of glaucoma datasets. To address these challenges, we propose a Two-Stage Decoupling Framework (TSDF) for variable-length glaucoma prognosis. In the first stage, we employ a feature representation module that leverages self-supervised learning to aggregate multiple glaucoma datasets for training, disregarding differences in their supervisory information. This approach enables datasets of varying sizes to learn better feature representations. In the second stage, we introduce a temporal aggregation module that incorporates an attention-based mechanism to process sequential inputs of varying lengths, ensuring flexible and efficient utilization of all available data. This design significantly enhances model performance while maintaining a compact parameter size. Extensive experiments on two benchmark glaucoma datasets:the Ocular Hypertension Treatment Study (OHTS) and the Glaucoma Real-world Appraisal Progression Ensemble (GRAPE),which differ significantly in scale and clinical settings,demonstrate the effectiveness and robustness of our approach.

* 11 pages.2 figures, 4 tables

Via

Access Paper or Ask Questions

Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks

Sep 15, 2025

Asim Waheed, Vasisht Duddu, Rui Zhang, Sebastian Szyller, N. Asokan

Figure 1 for Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks

Figure 2 for Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks

Figure 3 for Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks

Figure 4 for Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks

Abstract:ML models are susceptible to risks to security, privacy, and fairness. Several defenses are designed to protect against their intended risks, but can inadvertently affect susceptibility to other unrelated risks, known as unintended interactions. Several jurisdictions are preparing ML regulatory frameworks that require ML practitioners to assess the susceptibility of ML models to different risks. A library for valuating unintended interactions that can be used by (a) practitioners to evaluate unintended interactions at scale prior to model deployment and (b) researchers to design defenses which do not suffer from an unintended increase in unrelated risks. Ideally, such a library should be i) comprehensive by including representative attacks, defenses and metrics for different risks, ii) extensible to new modules due to its modular design, iii) consistent with a user-friendly API template for inputs and outputs, iv) applicable to evaluate previously unexplored unintended interactions. We present AMULET, a Python library that covers risks to security, privacy, and fairness, which satisfies all these requirements. AMULET can be used to evaluate unexplored unintended interactions, compare effectiveness between defenses or attacks, and include new attacks and defenses.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Joint Antenna Positioning and Beamforming for Movable Antenna Array Aided Ground Station in Low-Earth Orbit Satellite Communication

Sep 09, 2025

Jinming Wang, Lipeng Zhu, Shuai Han, He Sun, Rui Zhang

Figure 1 for Joint Antenna Positioning and Beamforming for Movable Antenna Array Aided Ground Station in Low-Earth Orbit Satellite Communication

Figure 2 for Joint Antenna Positioning and Beamforming for Movable Antenna Array Aided Ground Station in Low-Earth Orbit Satellite Communication

Figure 3 for Joint Antenna Positioning and Beamforming for Movable Antenna Array Aided Ground Station in Low-Earth Orbit Satellite Communication

Figure 4 for Joint Antenna Positioning and Beamforming for Movable Antenna Array Aided Ground Station in Low-Earth Orbit Satellite Communication

Abstract:This paper proposes a new architecture for the low-earth orbit (LEO) satellite ground station aided by movable antenna (MA) array. Unlike conventional fixed-position antenna (FPA), the MA array can flexibly adjust antenna positions to reconfigure array geometry, for more effectively mitigating interference and improving communication performance in ultra-dense LEO satellite networks. To reduce movement overhead, we configure antenna positions at the antenna initialization stage, which remain unchanged during the whole communication period of the ground station. To this end, an optimization problem is formulated to maximize the average achievable rate of the ground station by jointly optimizing its antenna position vector (APV) and time-varying beamforming weights, i.e., antenna weight vectors (AWVs). To solve the resulting non-convex optimization problem, we adopt the Lagrangian dual transformation and quadratic transformation to reformulate the objective function into a more tractable form. Then, we develop an efficient block coordinate descent-based iterative algorithm that alternately optimizes the APV and AWVs until convergence is reached. Simulation results demonstrate that our proposed MA scheme significantly outperforms traditional FPA by increasing the achievable rate at ground stations under various system setups, thus providing an efficient solution for interference mitigation in future ultra-dense LEO satellite communication networks.

Via

Access Paper or Ask Questions

Environment-Aware IRS Deployment via Channel Knowledge Map: Joint Sensing-Communications Coverage Optimization

Sep 05, 2025

Yilong Chen, Zixiang Ren, Jie Xu, Rui Zhang

Abstract:This paper studies the intelligent reflecting surface (IRS) deployment optimization problem for IRS-enabled integrated sensing and communications (ISAC) systems, in which multiple IRSs are strategically deployed at candidate locations to assist a base station (BS) to enhance the coverage of both sensing and communications. We present an environment-aware IRS deployment design via exploiting the channel knowledge map (CKM), which provides the channel state information (CSI) between each candidate IRS location and BS or targeted sensing/communication points. Based on the obtained CSI from CKM, we optimize the deployment of IRSs, jointly with the BS's transmit beamforming and IRSs' reflective beamforming during operation, with the objective of minimizing the system cost, while guaranteeing the minimum illumination power requirements at sensing areas and the minimum signal-to-noise ratio (SNR) requirements at communication areas. In particular, we consider two cases when the IRSs' reflective beamforming optimization can be implemented dynamically in real time and quasi-stationarily over the whole operation period, respectively. For both cases, the joint IRS deployment and transmit/reflective beamforming designs are formulated as mixed-integer non-convex optimization problems, which are solved via the successive convex approximation (SCA)-based relax-and-bound method. Specifically, we first relax the binary IRS deployment indicators into continuous variables, then find converged solutions via SCA, and finally round relaxed indicators back to binary values. Numerical results demonstrate the effectiveness of our proposed algorithms in reducing the system cost while meeting the sensing and communication requirements.

* 13 pages, 11 figures

Via

Access Paper or Ask Questions

WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human

Sep 04, 2025

Qijun Ying, Zhongyuan Hu, Rui Zhang, Ronghui Li, Yu Lu, Zijiao Zeng

Figure 1 for WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human

Figure 2 for WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human

Figure 3 for WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human

Figure 4 for WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human

Abstract:Global human motion reconstruction from in-the-wild monocular videos is increasingly demanded across VR, graphics, and robotics applications, yet requires accurate mapping of human poses from camera to world coordinates-a task challenged by depth ambiguity, motion ambiguity, and the entanglement between camera and human movements. While human-motion-centric approaches excel in preserving motion details and physical plausibility, they suffer from two critical limitations: insufficient exploitation of camera orientation information and ineffective integration of camera translation cues. We present WATCH (World-aware Allied Trajectory and pose reconstruction for Camera and Human), a unified framework addressing both challenges. Our approach introduces an analytical heading angle decomposition technique that offers superior efficiency and extensibility compared to existing geometric methods. Additionally, we design a camera trajectory integration mechanism inspired by world models, providing an effective pathway for leveraging camera translation information beyond naive hard-decoding approaches. Through experiments on in-the-wild benchmarks, WATCH achieves state-of-the-art performance in end-to-end trajectory reconstruction. Our work demonstrates the effectiveness of jointly modeling camera-human motion relationships and offers new insights for addressing the long-standing challenge of camera translation integration in global human motion reconstruction. The code will be available publicly.

Via

Access Paper or Ask Questions

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Aug 26, 2025

Rui Zhang, Zihan Wang, Tianli Yang, Hongwei Li, Wenbo Jiang, Qingchuan Zhao, Yang Liu, Guowen Xu

Abstract:Vision-Language Models (VLMs) are increasingly deployed in real-world applications, but their high inference cost makes them vulnerable to resource consumption attacks. Prior attacks attempt to extend VLM output sequences by optimizing adversarial images, thereby increasing inference costs. However, these extended outputs often introduce irrelevant abnormal content, compromising attack stealthiness. This trade-off between effectiveness and stealthiness poses a major limitation for existing attacks. To address this challenge, we propose \textit{Hidden Tail}, a stealthy resource consumption attack that crafts prompt-agnostic adversarial images, inducing VLMs to generate maximum-length outputs by appending special tokens invisible to users. Our method employs a composite loss function that balances semantic preservation, repetitive special token induction, and suppression of the end-of-sequence (EOS) token, optimized via a dynamic weighting strategy. Extensive experiments show that \textit{Hidden Tail} outperforms existing attacks, increasing output length by up to 19.2$\times$ and reaching the maximum token limit, while preserving attack stealthiness. These results highlight the urgent need to improve the robustness of VLMs against efficiency-oriented adversarial threats. Our code is available at https://github.com/zhangrui4041/Hidden_Tail.

Via

Access Paper or Ask Questions