Picture for Hanrong Ye

Hanrong Ye

MM-Ego: Towards Building Egocentric Multimodal LLMs

Add code
Oct 09, 2024
Figure 1 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 2 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 3 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Figure 4 for MM-Ego: Towards Building Egocentric Multimodal LLMs
Viaarxiv icon

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

Add code
Jul 01, 2024
Viaarxiv icon

X-VILA: Cross-Modality Alignment for Large Language Model

Add code
May 29, 2024
Figure 1 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 2 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 3 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 4 for X-VILA: Cross-Modality Alignment for Large Language Model
Viaarxiv icon

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

Add code
Mar 22, 2024
Viaarxiv icon

SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis

Add code
Nov 06, 2023
Viaarxiv icon

TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts

Add code
Jul 28, 2023
Viaarxiv icon

Contrastive Multi-Task Dense Prediction

Add code
Jul 16, 2023
Viaarxiv icon

InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding

Add code
Jun 08, 2023
Viaarxiv icon

Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth Estimation

Add code
Apr 06, 2023
Viaarxiv icon

Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Add code
Mar 15, 2022
Figure 1 for Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
Figure 2 for Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
Figure 3 for Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
Figure 4 for Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
Viaarxiv icon