Picture for Fanjing Kong

Fanjing Kong

FG-CLIP: Fine-Grained Visual and Textual Alignment

Add code
May 08, 2025
Figure 1 for FG-CLIP: Fine-Grained Visual and Textual Alignment
Figure 2 for FG-CLIP: Fine-Grained Visual and Textual Alignment
Figure 3 for FG-CLIP: Fine-Grained Visual and Textual Alignment
Figure 4 for FG-CLIP: Fine-Grained Visual and Textual Alignment
Viaarxiv icon

Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework

Add code
May 08, 2022
Figure 1 for Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework
Figure 2 for Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework
Figure 3 for Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework
Figure 4 for Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework
Viaarxiv icon