Picture for Kaiwen Zhou

Kaiwen Zhou

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

Add code
Jan 29, 2024
Figure 1 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 2 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 3 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 4 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Viaarxiv icon

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

Add code
Jan 16, 2024
Figure 1 for Enhancing Evolving Domain Generalization through Dynamic Latent Representations
Figure 2 for Enhancing Evolving Domain Generalization through Dynamic Latent Representations
Figure 3 for Enhancing Evolving Domain Generalization through Dynamic Latent Representations
Figure 4 for Enhancing Evolving Domain Generalization through Dynamic Latent Representations
Viaarxiv icon

Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes

Add code
Nov 30, 2023
Figure 1 for Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes
Figure 2 for Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes
Viaarxiv icon

Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

Add code
Oct 29, 2023
Figure 1 for Does Invariant Graph Learning via Environment Augmentation Learn Invariance?
Figure 2 for Does Invariant Graph Learning via Environment Augmentation Learn Invariance?
Figure 3 for Does Invariant Graph Learning via Environment Augmentation Learn Invariance?
Figure 4 for Does Invariant Graph Learning via Environment Augmentation Learn Invariance?
Viaarxiv icon

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

Add code
Oct 09, 2023
Figure 1 for ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Figure 2 for ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Figure 3 for ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Figure 4 for ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
Viaarxiv icon

Towards Understanding Feature Learning in Out-of-Distribution Generalization

Add code
Apr 22, 2023
Figure 1 for Towards Understanding Feature Learning in Out-of-Distribution Generalization
Figure 2 for Towards Understanding Feature Learning in Out-of-Distribution Generalization
Figure 3 for Towards Understanding Feature Learning in Out-of-Distribution Generalization
Figure 4 for Towards Understanding Feature Learning in Out-of-Distribution Generalization
Viaarxiv icon

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

Add code
Jan 30, 2023
Figure 1 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 2 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 3 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 4 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Viaarxiv icon

Navigation as the Attacker Wishes? Towards Building Byzantine-Robust Embodied Agents under Federated Learning

Add code
Dec 02, 2022
Figure 1 for Navigation as the Attacker Wishes? Towards Building Byzantine-Robust Embodied Agents under Federated Learning
Figure 2 for Navigation as the Attacker Wishes? Towards Building Byzantine-Robust Embodied Agents under Federated Learning
Figure 3 for Navigation as the Attacker Wishes? Towards Building Byzantine-Robust Embodied Agents under Federated Learning
Figure 4 for Navigation as the Attacker Wishes? Towards Building Byzantine-Robust Embodied Agents under Federated Learning
Viaarxiv icon

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

Add code
Aug 30, 2022
Figure 1 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 2 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 3 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Figure 4 for JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Viaarxiv icon

Efficient Private SCO for Heavy-Tailed Data via Clipping

Add code
Jun 27, 2022
Figure 1 for Efficient Private SCO for Heavy-Tailed Data via Clipping
Figure 2 for Efficient Private SCO for Heavy-Tailed Data via Clipping
Figure 3 for Efficient Private SCO for Heavy-Tailed Data via Clipping
Figure 4 for Efficient Private SCO for Heavy-Tailed Data via Clipping
Viaarxiv icon