Picture for Zequn Jie

Zequn Jie

3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

Add code
Jul 13, 2024
Viaarxiv icon

Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization

Add code
Jul 11, 2024
Viaarxiv icon

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Add code
Jul 10, 2024
Viaarxiv icon

MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

Add code
Jul 03, 2024
Figure 1 for MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
Figure 2 for MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
Figure 3 for MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
Figure 4 for MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
Viaarxiv icon

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

Add code
Jun 12, 2024
Figure 1 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Figure 2 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Figure 3 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Figure 4 for Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
Viaarxiv icon

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

Add code
Jun 01, 2024
Viaarxiv icon

Matten: Video Generation with Mamba-Attention

Add code
May 05, 2024
Viaarxiv icon

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

Add code
Mar 12, 2024
Figure 1 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Figure 2 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Figure 3 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Figure 4 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Viaarxiv icon

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

Add code
Feb 20, 2024
Viaarxiv icon

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

Add code
Jan 30, 2024
Viaarxiv icon