Picture for Ji Zhang

Ji Zhang

MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

Add code
Aug 26, 2024
Viaarxiv icon

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Add code
Aug 09, 2024
Figure 1 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Figure 2 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Figure 3 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Figure 4 for mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Viaarxiv icon

ProFuser: Progressive Fusion of Large Language Models

Add code
Aug 09, 2024
Viaarxiv icon

MIBench: Evaluating Multimodal Large Language Models over Multiple Images

Add code
Jul 21, 2024
Figure 1 for MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Figure 2 for MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Figure 3 for MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Figure 4 for MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Viaarxiv icon

Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

Add code
Jul 19, 2024
Figure 1 for Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
Figure 2 for Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
Figure 3 for Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
Figure 4 for Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
Viaarxiv icon

DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

Add code
Jul 18, 2024
Figure 1 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
Figure 2 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
Figure 3 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
Figure 4 for DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
Viaarxiv icon

Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

Add code
Jun 13, 2024
Viaarxiv icon

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Add code
Jun 03, 2024
Viaarxiv icon

Neural Dynamic Data Valuation

Add code
Apr 30, 2024
Figure 1 for Neural Dynamic Data Valuation
Figure 2 for Neural Dynamic Data Valuation
Figure 3 for Neural Dynamic Data Valuation
Figure 4 for Neural Dynamic Data Valuation
Viaarxiv icon

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

Add code
Apr 25, 2024
Viaarxiv icon