Picture for Haozhe Zhao

Haozhe Zhao

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Add code
Jul 07, 2024
Figure 1 for UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Figure 2 for UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Figure 3 for UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Figure 4 for UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Viaarxiv icon

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Add code
Jun 29, 2024
Figure 1 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Figure 2 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Figure 3 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Figure 4 for MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Viaarxiv icon

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Add code
Apr 12, 2024
Figure 1 for Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
Figure 2 for Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
Figure 3 for Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
Figure 4 for Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
Viaarxiv icon

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Add code
Mar 11, 2024
Figure 1 for An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Figure 2 for An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Figure 3 for An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Figure 4 for An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Viaarxiv icon

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Add code
Feb 21, 2024
Figure 1 for PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
Figure 2 for PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
Figure 3 for PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
Figure 4 for PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
Viaarxiv icon

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks

Add code
Nov 16, 2023
Figure 1 for ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
Figure 2 for ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
Figure 3 for ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
Figure 4 for ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks
Viaarxiv icon

Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning

Add code
Nov 14, 2023
Figure 1 for Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning
Figure 2 for Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning
Figure 3 for Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning
Figure 4 for Distantly-Supervised Named Entity Recognition with Uncertainty-aware Teacher Learning and Student-student Collaborative Learning
Viaarxiv icon

Coarse-to-Fine Dual Encoders are Better Frame Identification Learners

Add code
Oct 20, 2023
Figure 1 for Coarse-to-Fine Dual Encoders are Better Frame Identification Learners
Figure 2 for Coarse-to-Fine Dual Encoders are Better Frame Identification Learners
Figure 3 for Coarse-to-Fine Dual Encoders are Better Frame Identification Learners
Figure 4 for Coarse-to-Fine Dual Encoders are Better Frame Identification Learners
Viaarxiv icon

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

Add code
Oct 16, 2023
Figure 1 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Figure 2 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Figure 3 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Figure 4 for Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond
Viaarxiv icon

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Add code
Oct 02, 2023
Figure 1 for MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Figure 2 for MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Figure 3 for MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Figure 4 for MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Viaarxiv icon