Picture for Siyuan Huang

Siyuan Huang

MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks

Add code
Oct 17, 2024
Figure 1 for MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks
Figure 2 for MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks
Figure 3 for MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks
Figure 4 for MMAD-Purify: A Precision-Optimized Framework for Efficient and Scalable Multi-Modal Attacks
Viaarxiv icon

Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention

Add code
Oct 09, 2024
Figure 1 for Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Figure 2 for Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Figure 3 for Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Figure 4 for Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Viaarxiv icon

Mirror-Consistency: Harnessing Inconsistency in Majority Voting

Add code
Oct 07, 2024
Viaarxiv icon

Autonomous Character-Scene Interaction Synthesis from Text Instruction

Add code
Oct 04, 2024
Figure 1 for Autonomous Character-Scene Interaction Synthesis from Text Instruction
Figure 2 for Autonomous Character-Scene Interaction Synthesis from Text Instruction
Figure 3 for Autonomous Character-Scene Interaction Synthesis from Text Instruction
Figure 4 for Autonomous Character-Scene Interaction Synthesis from Text Instruction
Viaarxiv icon

Effective Tuning Strategies for Generalist Robot Manipulation Policies

Add code
Oct 02, 2024
Viaarxiv icon

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Add code
Sep 30, 2024
Figure 1 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Figure 2 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Figure 3 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Figure 4 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Viaarxiv icon

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation

Add code
Sep 26, 2024
Viaarxiv icon

Multi-modal Situated Reasoning in 3D Scenes

Add code
Sep 04, 2024
Viaarxiv icon

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

Add code
Aug 13, 2024
Viaarxiv icon

Task-oriented Sequential Grounding in 3D Scenes

Add code
Aug 07, 2024
Figure 1 for Task-oriented Sequential Grounding in 3D Scenes
Figure 2 for Task-oriented Sequential Grounding in 3D Scenes
Figure 3 for Task-oriented Sequential Grounding in 3D Scenes
Figure 4 for Task-oriented Sequential Grounding in 3D Scenes
Viaarxiv icon