Picture for Jinyi Hu

Jinyi Hu

GUICourse: From General Vision Language Models to Versatile GUI Agents

Add code
Jun 17, 2024
Viaarxiv icon

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

Add code
Jun 08, 2024
Viaarxiv icon

LEGENT: Open Platform for Embodied Agents

Add code
Apr 28, 2024
Figure 1 for LEGENT: Open Platform for Embodied Agents
Figure 2 for LEGENT: Open Platform for Embodied Agents
Figure 3 for LEGENT: Open Platform for Embodied Agents
Figure 4 for LEGENT: Open Platform for Embodied Agents
Viaarxiv icon

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Add code
Feb 21, 2024
Figure 1 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 2 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 3 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 4 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Viaarxiv icon

Exploring Perceptual Limitation of Multimodal Large Language Models

Add code
Feb 12, 2024
Viaarxiv icon

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Add code
Dec 01, 2023
Figure 1 for RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Figure 2 for RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Figure 3 for RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Figure 4 for RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Viaarxiv icon

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Add code
Oct 01, 2023
Figure 1 for Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Figure 2 for Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Figure 3 for Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Figure 4 for Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Viaarxiv icon

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Add code
Aug 23, 2023
Figure 1 for Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Figure 2 for Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Figure 3 for Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Figure 4 for Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Viaarxiv icon

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Add code
May 19, 2023
Figure 1 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Figure 2 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Figure 3 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Figure 4 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Viaarxiv icon

Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention

Add code
Nov 14, 2022
Figure 1 for Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Figure 2 for Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Figure 3 for Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Figure 4 for Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Viaarxiv icon