Picture for Sipeng Zheng

Sipeng Zheng

QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

Add code
Jun 24, 2024
Viaarxiv icon

EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

Add code
May 28, 2024
Figure 1 for EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
Figure 2 for EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
Figure 3 for EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
Figure 4 for EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?
Viaarxiv icon

UniCode: Learning a Unified Codebook for Multimodal Large Language Models

Add code
Mar 14, 2024
Figure 1 for UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Figure 2 for UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Figure 3 for UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Figure 4 for UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Viaarxiv icon

POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

Add code
Mar 09, 2024
Figure 1 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Figure 2 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Figure 3 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Figure 4 for POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Viaarxiv icon

SPAFormer: Sequential 3D Part Assembly with Transformers

Add code
Mar 09, 2024
Figure 1 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 2 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 3 for SPAFormer: Sequential 3D Part Assembly with Transformers
Figure 4 for SPAFormer: Sequential 3D Part Assembly with Transformers
Viaarxiv icon

Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds

Add code
Oct 20, 2023
Figure 1 for Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Figure 2 for Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Figure 3 for Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Figure 4 for Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Viaarxiv icon

LLaMA Rider: Spurring Large Language Models to Explore the Open World

Add code
Oct 13, 2023
Viaarxiv icon

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

Add code
Jul 20, 2023
Figure 1 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Figure 2 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Figure 3 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Figure 4 for No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Viaarxiv icon

Accommodating Audio Modality in CLIP for Multimodal Processing

Add code
Mar 12, 2023
Figure 1 for Accommodating Audio Modality in CLIP for Multimodal Processing
Figure 2 for Accommodating Audio Modality in CLIP for Multimodal Processing
Figure 3 for Accommodating Audio Modality in CLIP for Multimodal Processing
Figure 4 for Accommodating Audio Modality in CLIP for Multimodal Processing
Viaarxiv icon

Exploring Anchor-based Detection for Ego4D Natural Language Query

Add code
Aug 10, 2022
Figure 1 for Exploring Anchor-based Detection for Ego4D Natural Language Query
Figure 2 for Exploring Anchor-based Detection for Ego4D Natural Language Query
Figure 3 for Exploring Anchor-based Detection for Ego4D Natural Language Query
Figure 4 for Exploring Anchor-based Detection for Ego4D Natural Language Query
Viaarxiv icon