Picture for Zheda Mai

Zheda Mai

Lessons and Open Questions from a Unified Study of Camera-Trap Species Recognition Over Time

Add code
Mar 20, 2026
Viaarxiv icon

Revisiting Model Stitching In the Foundation Model Era

Add code
Mar 16, 2026
Viaarxiv icon

Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective

Add code
Nov 11, 2025
Viaarxiv icon

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models

Add code
Jun 10, 2025
Figure 1 for AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
Figure 2 for AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
Figure 3 for AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
Figure 4 for AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
Viaarxiv icon

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning

Add code
May 29, 2025
Figure 1 for BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
Figure 2 for BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
Figure 3 for BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
Figure 4 for BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
Viaarxiv icon

Revisiting semi-supervised learning in the era of foundation models

Add code
Mar 12, 2025
Figure 1 for Revisiting semi-supervised learning in the era of foundation models
Figure 2 for Revisiting semi-supervised learning in the era of foundation models
Figure 3 for Revisiting semi-supervised learning in the era of foundation models
Figure 4 for Revisiting semi-supervised learning in the era of foundation models
Viaarxiv icon

MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference

Add code
Feb 24, 2025
Figure 1 for MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Figure 2 for MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Figure 3 for MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Figure 4 for MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Viaarxiv icon

Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation

Add code
Jan 20, 2025
Figure 1 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 2 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 3 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 4 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Viaarxiv icon

Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis

Add code
Jan 16, 2025
Figure 1 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 2 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 3 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 4 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Viaarxiv icon

Fine-Tuning is Fine, if Calibrated

Add code
Sep 24, 2024
Figure 1 for Fine-Tuning is Fine, if Calibrated
Figure 2 for Fine-Tuning is Fine, if Calibrated
Figure 3 for Fine-Tuning is Fine, if Calibrated
Figure 4 for Fine-Tuning is Fine, if Calibrated
Viaarxiv icon