Picture for Jiwen Zhang

Jiwen Zhang

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Add code
May 28, 2024
Viaarxiv icon

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Add code
Apr 02, 2024
Figure 1 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Figure 2 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Figure 3 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Figure 4 for DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
Viaarxiv icon

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Add code
Mar 05, 2024
Figure 1 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 2 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 3 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Figure 4 for Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Viaarxiv icon

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Add code
Oct 17, 2023
Figure 1 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 2 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 3 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 4 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Viaarxiv icon

Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making

Add code
Jul 16, 2023
Figure 1 for Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making
Figure 2 for Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making
Figure 3 for Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making
Figure 4 for Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making
Viaarxiv icon

Robotic Assembly Control Reconfiguration Based on Transfer Reinforcement Learning for Objects with Different Geometric Features

Add code
Nov 04, 2022
Figure 1 for Robotic Assembly Control Reconfiguration Based on Transfer Reinforcement Learning for Objects with Different Geometric Features
Figure 2 for Robotic Assembly Control Reconfiguration Based on Transfer Reinforcement Learning for Objects with Different Geometric Features
Figure 3 for Robotic Assembly Control Reconfiguration Based on Transfer Reinforcement Learning for Objects with Different Geometric Features
Figure 4 for Robotic Assembly Control Reconfiguration Based on Transfer Reinforcement Learning for Objects with Different Geometric Features
Viaarxiv icon

Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly

Add code
Oct 24, 2022
Figure 1 for Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly
Figure 2 for Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly
Figure 3 for Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly
Figure 4 for Local Connection Reinforcement Learning Method for Efficient Control of Robotic Peg-in-Hole Assembly
Viaarxiv icon

Curriculum Learning for Vision-and-Language Navigation

Add code
Nov 14, 2021
Figure 1 for Curriculum Learning for Vision-and-Language Navigation
Figure 2 for Curriculum Learning for Vision-and-Language Navigation
Figure 3 for Curriculum Learning for Vision-and-Language Navigation
Figure 4 for Curriculum Learning for Vision-and-Language Navigation
Viaarxiv icon