Picture for Xin Li

Xin Li

College of Business, City University of Hong Kong, Hong Kong, China

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Add code
Jan 08, 2025
Viaarxiv icon

Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents

Add code
Jan 03, 2025
Figure 1 for Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents
Figure 2 for Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents
Figure 3 for Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents
Figure 4 for Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents
Viaarxiv icon

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Add code
Jan 03, 2025
Figure 1 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 2 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 3 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Figure 4 for 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Viaarxiv icon

Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner

Add code
Dec 30, 2024
Figure 1 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Figure 2 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Figure 3 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Figure 4 for Enhancing Table Recognition with Vision LLMs: A Benchmark and Neighbor-Guided Toolchain Reasoner
Viaarxiv icon

ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval

Add code
Dec 24, 2024
Figure 1 for ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
Figure 2 for ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
Figure 3 for ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
Figure 4 for ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
Viaarxiv icon

SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

Add code
Dec 19, 2024
Figure 1 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction
Figure 2 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction
Figure 3 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction
Figure 4 for SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction
Viaarxiv icon

Adversarially robust generalization theory via Jacobian regularization for deep neural networks

Add code
Dec 17, 2024
Viaarxiv icon

RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification

Add code
Dec 17, 2024
Figure 1 for RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification
Figure 2 for RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification
Figure 3 for RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification
Figure 4 for RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification
Viaarxiv icon

Look Ahead Text Understanding and LLM Stitching

Add code
Dec 16, 2024
Viaarxiv icon

AURORA: Automated Unleash of 3D Room Outlines for VR Applications

Add code
Dec 15, 2024
Viaarxiv icon