Picture for Chun Jason Xue

Chun Jason Xue

Beyond Semantic Similarity: Reducing Unnecessary API Calls via Behavior-Aligned Retriever

Add code
Aug 20, 2025
Viaarxiv icon

Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction

Add code
May 07, 2025
Viaarxiv icon

Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs

Add code
May 03, 2025
Viaarxiv icon

VLM-C4L: Continual Core Dataset Learning with Corner Case Optimization via Vision-Language Models for Autonomous Driving

Add code
Mar 29, 2025
Viaarxiv icon

FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference

Add code
Mar 04, 2025
Viaarxiv icon

CoT-VLM4Tar: Chain-of-Thought Guided Vision-Language Models for Traffic Anomaly Resolution

Add code
Mar 03, 2025
Figure 1 for CoT-VLM4Tar: Chain-of-Thought Guided Vision-Language Models for Traffic Anomaly Resolution
Figure 2 for CoT-VLM4Tar: Chain-of-Thought Guided Vision-Language Models for Traffic Anomaly Resolution
Figure 3 for CoT-VLM4Tar: Chain-of-Thought Guided Vision-Language Models for Traffic Anomaly Resolution
Figure 4 for CoT-VLM4Tar: Chain-of-Thought Guided Vision-Language Models for Traffic Anomaly Resolution
Viaarxiv icon

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models

Add code
Feb 21, 2025
Viaarxiv icon

EvoP: Robust LLM Inference via Evolutionary Pruning

Add code
Feb 19, 2025
Figure 1 for EvoP: Robust LLM Inference via Evolutionary Pruning
Figure 2 for EvoP: Robust LLM Inference via Evolutionary Pruning
Figure 3 for EvoP: Robust LLM Inference via Evolutionary Pruning
Figure 4 for EvoP: Robust LLM Inference via Evolutionary Pruning
Viaarxiv icon

A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization

Add code
Feb 18, 2025
Figure 1 for A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization
Figure 2 for A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization
Figure 3 for A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization
Figure 4 for A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization
Viaarxiv icon

RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

Add code
Jan 21, 2025
Viaarxiv icon