Picture for Wang Zhu

Wang Zhu

Zero-Shot Iterative Formalization and Planning in Partially Observable Environments

Add code
May 19, 2025
Viaarxiv icon

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Add code
Oct 14, 2024
Figure 1 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Figure 2 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Figure 3 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Figure 4 for MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Viaarxiv icon

TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Add code
Oct 07, 2024
Figure 1 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 2 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 3 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Figure 4 for TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Viaarxiv icon

ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method

Add code
Jul 13, 2024
Figure 1 for ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method
Figure 2 for ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method
Figure 3 for ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method
Figure 4 for ST-RetNet: A Long-term Spatial-Temporal Traffic Flow Prediction Method
Viaarxiv icon

Language Models can Infer Action Semantics for Classical Planners from Environment Feedback

Add code
Jun 04, 2024
Figure 1 for Language Models can Infer Action Semantics for Classical Planners from Environment Feedback
Figure 2 for Language Models can Infer Action Semantics for Classical Planners from Environment Feedback
Figure 3 for Language Models can Infer Action Semantics for Classical Planners from Environment Feedback
Figure 4 for Language Models can Infer Action Semantics for Classical Planners from Environment Feedback
Viaarxiv icon

Hybrid Transformer and Spatial-Temporal Self-Supervised Learning for Long-term Traffic Prediction

Add code
Jan 29, 2024
Viaarxiv icon

Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?

Add code
Dec 02, 2023
Figure 1 for Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
Figure 2 for Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
Figure 3 for Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
Figure 4 for Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
Viaarxiv icon

Efficient End-to-End Visual Document Understanding with Rationale Distillation

Add code
Nov 16, 2023
Figure 1 for Efficient End-to-End Visual Document Understanding with Rationale Distillation
Figure 2 for Efficient End-to-End Visual Document Understanding with Rationale Distillation
Figure 3 for Efficient End-to-End Visual Document Understanding with Rationale Distillation
Figure 4 for Efficient End-to-End Visual Document Understanding with Rationale Distillation
Viaarxiv icon

Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering

Add code
May 24, 2023
Figure 1 for Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
Figure 2 for Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
Figure 3 for Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
Figure 4 for Chain-of-Questions Training with Latent Answers for Robust Multistep Question Answering
Viaarxiv icon

Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems

Add code
Oct 26, 2022
Figure 1 for Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems
Figure 2 for Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems
Figure 3 for Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems
Figure 4 for Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems
Viaarxiv icon