Picture for Li Zhang

Li Zhang

Shammie

Language as Prior, Vision as Calibration: Metric Scale Recovery for Monocular Depth Estimation

Add code
Jan 07, 2026
Viaarxiv icon

Empowering Small Language Models with Factual Hallucination-Aware Reasoning for Financial Classification

Add code
Jan 04, 2026
Viaarxiv icon

GeoTeacher: Geometry-Guided Semi-Supervised 3D Object Detection

Add code
Dec 29, 2025
Viaarxiv icon

A Tri-Dynamic Preprocessing Framework for UGC Video Compression

Add code
Dec 18, 2025
Figure 1 for A Tri-Dynamic Preprocessing Framework for UGC Video Compression
Figure 2 for A Tri-Dynamic Preprocessing Framework for UGC Video Compression
Figure 3 for A Tri-Dynamic Preprocessing Framework for UGC Video Compression
Figure 4 for A Tri-Dynamic Preprocessing Framework for UGC Video Compression
Viaarxiv icon

Audio-Visual Cross-Modal Compression for Generative Face Video Coding

Add code
Dec 17, 2025
Viaarxiv icon

Generative Preprocessing for Image Compression with Pre-trained Diffusion Models

Add code
Dec 17, 2025
Viaarxiv icon

A Preprocessing Framework for Video Machine Vision under Compression

Add code
Dec 17, 2025
Figure 1 for A Preprocessing Framework for Video Machine Vision under Compression
Figure 2 for A Preprocessing Framework for Video Machine Vision under Compression
Figure 3 for A Preprocessing Framework for Video Machine Vision under Compression
Figure 4 for A Preprocessing Framework for Video Machine Vision under Compression
Viaarxiv icon

Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks

Add code
Nov 19, 2025
Viaarxiv icon

NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning

Add code
Nov 12, 2025
Viaarxiv icon

From Noise to Latent: Generating Gaussian Latents for INR-Based Image Compression

Add code
Nov 11, 2025
Viaarxiv icon