Picture for Joya Chen

Joya Chen

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Viaarxiv icon

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

Add code
Jun 12, 2024
Figure 1 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 2 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 3 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Figure 4 for From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Viaarxiv icon

Bootstrapping SparseFormers from Vision Foundation Models

Add code
Dec 04, 2023
Figure 1 for Bootstrapping SparseFormers from Vision Foundation Models
Figure 2 for Bootstrapping SparseFormers from Vision Foundation Models
Figure 3 for Bootstrapping SparseFormers from Vision Foundation Models
Figure 4 for Bootstrapping SparseFormers from Vision Foundation Models
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

UniVTG: Towards Unified Video-Language Temporal Grounding

Add code
Aug 18, 2023
Figure 1 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 2 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 3 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 4 for UniVTG: Towards Unified Video-Language Temporal Grounding
Viaarxiv icon

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Add code
Jun 28, 2023
Figure 1 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Figure 2 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Figure 3 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Figure 4 for AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Viaarxiv icon

Affordance Grounding from Demonstration Video to Target Image

Add code
Mar 26, 2023
Figure 1 for Affordance Grounding from Demonstration Video to Target Image
Figure 2 for Affordance Grounding from Demonstration Video to Target Image
Figure 3 for Affordance Grounding from Demonstration Video to Target Image
Figure 4 for Affordance Grounding from Demonstration Video to Target Image
Viaarxiv icon

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

Add code
Mar 08, 2022
Figure 1 for AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Figure 2 for AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Figure 3 for AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Figure 4 for AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Viaarxiv icon

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Add code
Feb 28, 2022
Figure 1 for DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Figure 2 for DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Figure 3 for DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Figure 4 for DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Viaarxiv icon

Foreground-Background Imbalance Problem in Deep Object Detectors: A Review

Add code
Jun 16, 2020
Figure 1 for Foreground-Background Imbalance Problem in Deep Object Detectors: A Review
Figure 2 for Foreground-Background Imbalance Problem in Deep Object Detectors: A Review
Viaarxiv icon