Picture for Jingjing Chen

Jingjing Chen

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

Add code
Jul 07, 2024
Viaarxiv icon

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models

Add code
Apr 19, 2024
Viaarxiv icon

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

Add code
Mar 12, 2024
Figure 1 for From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
Figure 2 for From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
Figure 3 for From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
Figure 4 for From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
Viaarxiv icon

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

Add code
Mar 12, 2024
Figure 1 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Figure 2 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Figure 3 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Figure 4 for Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Viaarxiv icon

Doubly Abductive Counterfactual Inference for Text-based Image Editing

Add code
Mar 05, 2024
Figure 1 for Doubly Abductive Counterfactual Inference for Text-based Image Editing
Figure 2 for Doubly Abductive Counterfactual Inference for Text-based Image Editing
Figure 3 for Doubly Abductive Counterfactual Inference for Text-based Image Editing
Figure 4 for Doubly Abductive Counterfactual Inference for Text-based Image Editing
Viaarxiv icon

Open-Vocabulary Video Relation Extraction

Add code
Dec 25, 2023
Figure 1 for Open-Vocabulary Video Relation Extraction
Figure 2 for Open-Vocabulary Video Relation Extraction
Figure 3 for Open-Vocabulary Video Relation Extraction
Figure 4 for Open-Vocabulary Video Relation Extraction
Viaarxiv icon

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

Add code
Dec 22, 2023
Viaarxiv icon

Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

Add code
Dec 13, 2023
Viaarxiv icon

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Add code
Oct 17, 2023
Figure 1 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 2 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 3 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Figure 4 for ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
Viaarxiv icon

Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild

Add code
Sep 26, 2023
Figure 1 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild
Figure 2 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild
Figure 3 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild
Figure 4 for Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild
Viaarxiv icon