Picture for Hongsheng Li

Hongsheng Li

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Add code
Oct 11, 2024
Figure 1 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Figure 2 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Figure 3 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Figure 4 for SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Viaarxiv icon

A foundation model for generalizable disease diagnosis in chest X-ray images

Add code
Oct 11, 2024
Viaarxiv icon

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Add code
Oct 10, 2024
Figure 1 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 2 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 3 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 4 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Viaarxiv icon

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

Add code
Oct 10, 2024
Figure 1 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 2 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 3 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Figure 4 for CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection
Viaarxiv icon

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Add code
Oct 10, 2024
Figure 1 for MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Figure 2 for MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Figure 3 for MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Figure 4 for MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Viaarxiv icon

I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

Add code
Oct 10, 2024
Figure 1 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 2 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 3 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Figure 4 for I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Viaarxiv icon

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

Add code
Oct 09, 2024
Viaarxiv icon

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More

Add code
Oct 08, 2024
Figure 1 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Figure 2 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Figure 3 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Figure 4 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Viaarxiv icon

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Add code
Sep 30, 2024
Figure 1 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Figure 2 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Figure 3 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Figure 4 for UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Viaarxiv icon

MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation

Add code
Sep 29, 2024
Figure 1 for MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation
Figure 2 for MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation
Figure 3 for MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation
Figure 4 for MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation
Viaarxiv icon