Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Borui Zhang

CodeSwift: Accelerating LLM Inference for Efficient Code Generation

Feb 24, 2025

Qianhui Zhao, Li Zhang, Fang Liu, Xiaoli Lian, Qiaoyuanhe Meng, Ziqian Jiao, Zetong Zhou, Borui Zhang, Runlin Guo, Jia Li

Figure 1 for CodeSwift: Accelerating LLM Inference for Efficient Code Generation

Figure 2 for CodeSwift: Accelerating LLM Inference for Efficient Code Generation

Figure 3 for CodeSwift: Accelerating LLM Inference for Efficient Code Generation

Figure 4 for CodeSwift: Accelerating LLM Inference for Efficient Code Generation

Abstract:Code generation is a latency-sensitive task that demands high timeliness, but the autoregressive decoding mechanism of Large Language Models (LLMs) leads to poor inference efficiency. Existing LLM inference acceleration methods mainly focus on standalone functions using only built-in components. Moreover, they treat code like natural language sequences, ignoring its unique syntax and semantic characteristics. As a result, the effectiveness of these approaches in code generation tasks remains limited and fails to align with real-world programming scenarios. To alleviate this issue, we propose CodeSwift, a simple yet highly efficient inference acceleration approach specifically designed for code generation, without comprising the quality of the output. CodeSwift constructs a multi-source datastore, providing access to both general and project-specific knowledge, facilitating the retrieval of high-quality draft sequences. Moreover, CodeSwift reduces retrieval cost by controlling retrieval timing, and enhances efficiency through parallel retrieval and a context- and LLM preference-aware cache. Experimental results show that CodeSwift can reach up to 2.53x and 2.54x speedup compared to autoregressive decoding in repository-level and standalone code generation tasks, respectively, outperforming state-of-the-art inference acceleration approaches by up to 88%.

Via

Access Paper or Ask Questions

Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Dec 19, 2024

Borui Zhang, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Figure 1 for Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Figure 2 for Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Figure 3 for Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Figure 4 for Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Abstract:Vector-quantized networks (VQNs) have exhibited remarkable performance across various tasks, yet they are prone to training instability, which complicates the training process due to the necessity for techniques such as subtle initialization and model distillation. In this study, we identify the local minima issue as the primary cause of this instability. To address this, we integrate an optimal transport method in place of the nearest neighbor search to achieve a more globally informed assignment. We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem, thereby enhancing the stability and efficiency of the training process. To mitigate the influence of diverse data distributions on the Sinkhorn algorithm, we implement a straightforward yet effective normalization strategy. Our comprehensive experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.

* Code is available at https://github.com/zbr17/OptVQ

Via

Access Paper or Ask Questions

Path Choice Matters for Clear Attribution in Path Methods

Jan 19, 2024

Borui Zhang, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Figure 1 for Path Choice Matters for Clear Attribution in Path Methods

Figure 2 for Path Choice Matters for Clear Attribution in Path Methods

Figure 3 for Path Choice Matters for Clear Attribution in Path Methods

Figure 4 for Path Choice Matters for Clear Attribution in Path Methods

Abstract:Rigorousness and clarity are both essential for interpretations of DNNs to engender human trust. Path methods are commonly employed to generate rigorous attributions that satisfy three axioms. However, the meaning of attributions remains ambiguous due to distinct path choices. To address the ambiguity, we introduce \textbf{Concentration Principle}, which centrally allocates high attributions to indispensable features, thereby endowing aesthetic and sparsity. We then present \textbf{SAMP}, a model-agnostic interpreter, which efficiently searches the near-optimal path from a pre-defined set of manipulation paths. Moreover, we propose the infinitesimal constraint (IC) and momentum strategy (MS) to improve the rigorousness and optimality. Visualizations show that SAMP can precisely reveal DNNs by pinpointing salient image pixels. We also perform quantitative experiments and observe that our method significantly outperforms the counterparts. Code: https://github.com/zbr17/SAMP.

* ICLR 2024 accepted

Via

Access Paper or Ask Questions

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Nov 29, 2023

Yuanhui Huang, Wenzhao Zheng, Borui Zhang, Jie Zhou, Jiwen Lu

Figure 1 for SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Figure 2 for SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Figure 3 for SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Figure 4 for SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

Abstract:3D occupancy prediction is an important task for the robustness of vision-centric autonomous driving, which aims to predict whether each point is occupied in the surrounding 3D space. Existing methods usually require 3D occupancy labels to produce meaningful results. However, it is very laborious to annotate the occupancy status of each voxel. In this paper, we propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences. We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene. We directly impose constraints on the 3D representations by treating them as signed distance fields. We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations. We propose an MVS-embedded strategy to directly optimize the SDF-induced weights with multiple depth proposals. Our SelfOcc outperforms the previous best method SceneRF by 58.7% using a single frame as input on SemanticKITTI and is the first self-supervised work that produces reasonable 3D occupancy for surround cameras on nuScenes. SelfOcc produces high-quality depth and achieves state-of-the-art results on novel depth synthesis, monocular depth estimation, and surround-view depth estimation on the SemanticKITTI, KITTI-2015, and nuScenes, respectively. Code: https://github.com/huang-yh/SelfOcc.

* Code is available at: https://github.com/huang-yh/SelfOcc

Via

Access Paper or Ask Questions

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Nov 27, 2023

Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, Jiwen Lu

Figure 1 for OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Figure 2 for OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Figure 3 for OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Figure 4 for OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Abstract:Understanding how the 3D scene evolves is vital for making decisions in autonomous driving. Most existing methods achieve this by predicting the movements of object boxes, which cannot capture more fine-grained scene information. In this paper, we explore a new framework of learning a world model, OccWorld, in the 3D Occupancy space to simultaneously predict the movement of the ego car and the evolution of the surrounding scenes. We propose to learn a world model based on 3D occupancy rather than 3D bounding boxes and segmentation maps for three reasons: 1) expressiveness. 3D occupancy can describe the more fine-grained 3D structure of the scene; 2) efficiency. 3D occupancy is more economical to obtain (e.g., from sparse LiDAR points). 3) versatility. 3D occupancy can adapt to both vision and LiDAR. To facilitate the modeling of the world evolution, we learn a reconstruction-based scene tokenizer on the 3D occupancy to obtain discrete scene tokens to describe the surrounding scenes. We then adopt a GPT-like spatial-temporal generative transformer to generate subsequent scene and ego tokens to decode the future occupancy and ego trajectory. Extensive experiments on the widely used nuScenes benchmark demonstrate the ability of OccWorld to effectively model the evolution of the driving scenes. OccWorld also produces competitive planning results without using instance and map supervision. Code: https://github.com/wzzheng/OccWorld.

* Code is available at: https://github.com/wzzheng/OccWorld

Via

Access Paper or Ask Questions

Exploring Unified Perspective For Fast Shapley Value Estimation

Nov 02, 2023

Borui Zhang, Baotong Tian, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Figure 1 for Exploring Unified Perspective For Fast Shapley Value Estimation

Figure 2 for Exploring Unified Perspective For Fast Shapley Value Estimation

Figure 3 for Exploring Unified Perspective For Fast Shapley Value Estimation

Figure 4 for Exploring Unified Perspective For Fast Shapley Value Estimation

Abstract:Shapley values have emerged as a widely accepted and trustworthy tool, grounded in theoretical axioms, for addressing challenges posed by black-box models like deep neural networks. However, computing Shapley values encounters exponential complexity in the number of features. Various approaches, including ApproSemivalue, KernelSHAP, and FastSHAP, have been explored to expedite the computation. We analyze the consistency of existing works and conclude that stochastic estimators can be unified as the linear transformation of importance sampling of feature subsets. Based on this, we investigate the possibility of designing simple amortized estimators and propose a straightforward and efficient one, SimSHAP, by eliminating redundant techniques. Extensive experiments conducted on tabular and image datasets validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.

Via

Access Paper or Ask Questions

Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

Dec 18, 2022

Borui Zhang, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Figure 1 for Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

Figure 2 for Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

Figure 3 for Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

Figure 4 for Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

Abstract:Deep learning has revolutionized human society, yet the black-box nature of deep neural networks hinders further application to reliability-demanded industries. In the attempt to unpack them, many works observe or impact internal variables to improve the model's comprehensibility and transparency. However, existing methods rely on intuitive assumptions and lack mathematical guarantees. To bridge this gap, we introduce Bort, an optimizer for improving model explainability with boundedness and orthogonality constraints on model parameters, derived from the sufficient conditions of model comprehensibility and transparency. We perform reconstruction and backtracking on the model representations optimized by Bort and observe an evident improvement in model explainability. Based on Bort, we are able to synthesize explainable adversarial samples without additional parameters and training. Surprisingly, we find Bort constantly improves the classification accuracy of various architectures including ResNet and DeiT on MNIST, CIFAR-10, and ImageNet.

Via

Access Paper or Ask Questions

A Roadmap for Big Model

Apr 02, 2022

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He(+90 more)

Abstract:With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present, there is a lack of research work that sorts out the overall progress of BMs and guides the follow-up research. In this paper, we cover not only the BM technologies themselves but also the prerequisites for BM training and applications with BMs, dividing the BM review into four parts: Resource, Models, Key Technologies and Application. We introduce 16 specific BM-related topics in those four parts, they are Data, Knowledge, Computing System, Parallel Training System, Language Model, Vision Model, Multi-modal Model, Theory&Interpretability, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. In each topic, we summarize clearly the current studies and propose some future research directions. At the end of this paper, we conclude the further development of BMs in a more general view.

* arXiv admin note: text overlap with arXiv:2107.06499 by other authors

Via

Access Paper or Ask Questions

Attributable Visual Similarity Learning

Mar 28, 2022

Borui Zhang, Wenzhao Zheng, Jie Zhou, Jiwen Lu

Figure 1 for Attributable Visual Similarity Learning

Figure 2 for Attributable Visual Similarity Learning

Figure 3 for Attributable Visual Similarity Learning

Figure 4 for Attributable Visual Similarity Learning

Abstract:This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Most existing similarity learning methods exacerbate the unexplainability by mapping each sample to a single point in the embedding space with a distance metric (e.g., Mahalanobis distance, Euclidean distance). Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph and then infer the overall similarity accordingly. Furthermore, we establish a bottom-up similarity construction and top-down similarity inference framework to infer the similarity based on semantic hierarchy consistency. We first identify unreliable higher-level similarity nodes and then correct them using the most coherent adjacent lower-level similarity nodes, which simultaneously preserve traces for similarity attribution. Extensive experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods and verify the interpretability of our framework. Code is available at https://github.com/zbr17/AVSL.

* Accepted to CVPR 2022. Source code available at https://github.com/zbr17/AVSL

Via

Access Paper or Ask Questions

Deep Relational Metric Learning

Aug 23, 2021

Wenzhao Zheng, Borui Zhang, Jiwen Lu, Jie Zhou

Figure 1 for Deep Relational Metric Learning

Figure 2 for Deep Relational Metric Learning

Figure 3 for Deep Relational Metric Learning

Figure 4 for Deep Relational Metric Learning

Abstract:This paper presents a deep relational metric learning (DRML) framework for image clustering and retrieval. Most existing deep metric learning methods learn an embedding space with a general objective of increasing interclass distances and decreasing intraclass distances. However, the conventional losses of metric learning usually suppress intraclass variations which might be helpful to identify samples of unseen classes. To address this problem, we propose to adaptively learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions. We further employ a relational module to capture the correlations among each feature in the ensemble and construct a graph to represent an image. We then perform relational inference on the graph to integrate the ensemble and obtain a relation-aware embedding to measure the similarities. Extensive experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.

* Accepted to ICCV 2021. Source code available at https://github.com/zbr17/DRML

Via

Access Paper or Ask Questions