Picture for Zhibin Wang

Zhibin Wang

DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference

Add code
Jan 27, 2026
Viaarxiv icon

OrchANN: A Unified I/O Orchestration Framework for Skewed Out-of-Core Vector Search

Add code
Dec 28, 2025
Viaarxiv icon

StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression

Add code
Nov 10, 2025
Figure 1 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Figure 2 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Figure 3 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Figure 4 for StreamKV: Streaming Video Question-Answering with Segment-based KV Cache Retrieval and Compression
Viaarxiv icon

MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs

Add code
Aug 28, 2025
Figure 1 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Figure 2 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Figure 3 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Figure 4 for MMG-Vid: Maximizing Marginal Gains at Segment-level and Token-level for Efficient Video LLMs
Viaarxiv icon

Chordless Structure: A Pathway to Simple and Expressive GNNs

Add code
May 25, 2025
Viaarxiv icon

FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding

Add code
May 23, 2025
Figure 1 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Figure 2 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Figure 3 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Figure 4 for FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding
Viaarxiv icon

Ultra-Low Complexity On-Orbit Compression for Remote Sensing Imagery via Block Modulated Imaging

Add code
Dec 24, 2024
Viaarxiv icon

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Add code
Nov 04, 2024
Figure 1 for MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Figure 2 for MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Figure 3 for MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Figure 4 for MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Viaarxiv icon

Revisiting SLO and Goodput Metrics in LLM Serving

Add code
Oct 18, 2024
Figure 1 for Revisiting SLO and Goodput Metrics in LLM Serving
Figure 2 for Revisiting SLO and Goodput Metrics in LLM Serving
Figure 3 for Revisiting SLO and Goodput Metrics in LLM Serving
Figure 4 for Revisiting SLO and Goodput Metrics in LLM Serving
Viaarxiv icon

Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval

Add code
Sep 30, 2024
Figure 1 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Figure 2 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Figure 3 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Figure 4 for Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval
Viaarxiv icon