Picture for Jinyi Hu

Jinyi Hu

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

Add code
Aug 31, 2024
Viaarxiv icon

GUICourse: From General Vision Language Models to Versatile GUI Agents

Add code
Jun 17, 2024
Figure 1 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 2 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 3 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Figure 4 for GUICourse: From General Vision Language Models to Versatile GUI Agents
Viaarxiv icon

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

Add code
Jun 08, 2024
Figure 1 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Figure 2 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Figure 3 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Figure 4 for Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Viaarxiv icon

LEGENT: Open Platform for Embodied Agents

Add code
Apr 28, 2024
Figure 1 for LEGENT: Open Platform for Embodied Agents
Figure 2 for LEGENT: Open Platform for Embodied Agents
Figure 3 for LEGENT: Open Platform for Embodied Agents
Figure 4 for LEGENT: Open Platform for Embodied Agents
Viaarxiv icon

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Add code
Feb 21, 2024
Figure 1 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 2 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 3 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Figure 4 for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Viaarxiv icon

Exploring Perceptual Limitation of Multimodal Large Language Models

Add code
Feb 12, 2024
Figure 1 for Exploring Perceptual Limitation of Multimodal Large Language Models
Figure 2 for Exploring Perceptual Limitation of Multimodal Large Language Models
Figure 3 for Exploring Perceptual Limitation of Multimodal Large Language Models
Figure 4 for Exploring Perceptual Limitation of Multimodal Large Language Models
Viaarxiv icon

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Add code
Dec 01, 2023
Viaarxiv icon

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Add code
Oct 01, 2023
Viaarxiv icon

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Add code
Aug 23, 2023
Viaarxiv icon

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Add code
May 19, 2023
Figure 1 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Figure 2 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Figure 3 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Figure 4 for Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Viaarxiv icon