Picture for Hao Li

Hao Li

Jack

IPO: Iterative Preference Optimization for Text-to-Video Generation

Add code
Feb 05, 2025
Figure 1 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 2 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 3 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Figure 4 for IPO: Iterative Preference Optimization for Text-to-Video Generation
Viaarxiv icon

Skewed Memorization in Large Language Models: Quantification and Decomposition

Add code
Feb 03, 2025
Figure 1 for Skewed Memorization in Large Language Models: Quantification and Decomposition
Figure 2 for Skewed Memorization in Large Language Models: Quantification and Decomposition
Figure 3 for Skewed Memorization in Large Language Models: Quantification and Decomposition
Figure 4 for Skewed Memorization in Large Language Models: Quantification and Decomposition
Viaarxiv icon

Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks

Add code
Feb 03, 2025
Figure 1 for Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks
Figure 2 for Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks
Figure 3 for Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks
Figure 4 for Omni-Mol: Exploring Universal Convergent Space for Omni-Molecular Tasks
Viaarxiv icon

Test-Time Code-Switching for Cross-lingual Aspect Sentiment Triplet Extraction

Add code
Jan 24, 2025
Viaarxiv icon

LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models

Add code
Jan 23, 2025
Figure 1 for LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
Figure 2 for LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
Figure 3 for LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
Figure 4 for LVPruning: An Effective yet Simple Language-Guided Vision Token Pruning Approach for Multi-modal Large Language Models
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models

Add code
Dec 30, 2024
Figure 1 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 2 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 3 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Figure 4 for E2EDiff: Direct Mapping from Noise to Data for Enhanced Diffusion Models
Viaarxiv icon

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning

Add code
Dec 30, 2024
Figure 1 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Figure 2 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Figure 3 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Figure 4 for Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Viaarxiv icon

UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control

Add code
Dec 26, 2024
Figure 1 for UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control
Figure 2 for UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control
Figure 3 for UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control
Figure 4 for UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control
Viaarxiv icon

LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding

Add code
Dec 24, 2024
Figure 1 for LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding
Figure 2 for LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding
Figure 3 for LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding
Figure 4 for LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding
Viaarxiv icon