Picture for Ce Zheng

Ce Zheng

Fast Collaborative Inference via Distributed Speculative Decoding

Add code
Dec 18, 2025
Viaarxiv icon

Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding

Add code
Sep 04, 2025
Figure 1 for Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Figure 2 for Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Figure 3 for Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Figure 4 for Communication-Efficient Collaborative LLM Inference via Distributed Speculative Decoding
Viaarxiv icon

DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding

Add code
Jul 16, 2025
Figure 1 for DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding
Figure 2 for DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding
Figure 3 for DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding
Figure 4 for DSSD: Efficient Edge-Device Deployment and Collaborative Inference via Distributed Split Speculative Decoding
Viaarxiv icon

EdgePrompt: A Distributed Key-Value Inference Framework for LLMs in 6G Networks

Add code
Apr 16, 2025
Viaarxiv icon

MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX

Add code
Mar 27, 2025
Figure 1 for MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX
Figure 2 for MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX
Figure 3 for MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX
Figure 4 for MAVERIX: Multimodal Audio-Visual Evaluation Reasoning IndeX
Viaarxiv icon

Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation

Add code
Dec 29, 2024
Viaarxiv icon

INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

Add code
Sep 26, 2024
Figure 1 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 2 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 3 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 4 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Viaarxiv icon

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Add code
Sep 04, 2024
Figure 1 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 2 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 3 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Figure 4 for Towards a Unified View of Preference Learning for Large Language Models: A Survey
Viaarxiv icon

Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

Add code
Jul 17, 2024
Viaarxiv icon

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Add code
Jun 30, 2024
Figure 1 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 2 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 3 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Figure 4 for LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Viaarxiv icon