Picture for Dong Wang

Dong Wang

Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning

Add code
Oct 11, 2024
Figure 1 for Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Figure 2 for Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Figure 3 for Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Figure 4 for Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Viaarxiv icon

Inference Scaling for Long-Context Retrieval Augmented Generation

Add code
Oct 06, 2024
Figure 1 for Inference Scaling for Long-Context Retrieval Augmented Generation
Figure 2 for Inference Scaling for Long-Context Retrieval Augmented Generation
Figure 3 for Inference Scaling for Long-Context Retrieval Augmented Generation
Figure 4 for Inference Scaling for Long-Context Retrieval Augmented Generation
Viaarxiv icon

Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface

Add code
Sep 29, 2024
Figure 1 for Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface
Figure 2 for Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface
Figure 3 for Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface
Figure 4 for Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface
Viaarxiv icon

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

Add code
Sep 29, 2024
Figure 1 for Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective
Figure 2 for Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective
Figure 3 for Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective
Figure 4 for Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective
Viaarxiv icon

Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

Add code
Sep 25, 2024
Figure 1 for Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
Figure 2 for Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
Figure 3 for Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
Figure 4 for Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
Viaarxiv icon

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

Add code
Sep 18, 2024
Figure 1 for AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
Figure 2 for AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
Figure 3 for AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
Figure 4 for AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
Viaarxiv icon

Full-text Error Correction for Chinese Speech Recognition with Large Language Model

Add code
Sep 12, 2024
Figure 1 for Full-text Error Correction for Chinese Speech Recognition with Large Language Model
Figure 2 for Full-text Error Correction for Chinese Speech Recognition with Large Language Model
Figure 3 for Full-text Error Correction for Chinese Speech Recognition with Large Language Model
Figure 4 for Full-text Error Correction for Chinese Speech Recognition with Large Language Model
Viaarxiv icon

Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Add code
Aug 27, 2024
Figure 1 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions
Figure 2 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions
Figure 3 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions
Figure 4 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions
Viaarxiv icon

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Add code
Aug 23, 2024
Figure 1 for Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
Figure 2 for Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
Figure 3 for Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
Figure 4 for Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
Viaarxiv icon

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Add code
Aug 15, 2024
Figure 1 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking
Figure 2 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking
Figure 3 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking
Figure 4 for MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking
Viaarxiv icon