Picture for Yi Wang

Yi Wang

NUS

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

Add code
Jan 21, 2025
Figure 1 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 2 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 3 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Figure 4 for InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Viaarxiv icon

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

Add code
Jan 17, 2025
Figure 1 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Figure 2 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Figure 3 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Figure 4 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Viaarxiv icon

MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning

Add code
Jan 16, 2025
Figure 1 for MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning
Figure 2 for MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning
Figure 3 for MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning
Figure 4 for MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning
Viaarxiv icon

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Add code
Jan 14, 2025
Figure 1 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 2 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 3 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 4 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Viaarxiv icon

SELMA3D challenge: Self-supervised learning for 3D light-sheet microscopy image segmentation

Add code
Jan 07, 2025
Viaarxiv icon

Stochastically Constrained Best Arm Identification with Thompson Sampling

Add code
Jan 07, 2025
Viaarxiv icon

Salient Region Matching for Fully Automated MR-TRUS Registration

Add code
Jan 07, 2025
Viaarxiv icon

Interpretable Load Forecasting via Representation Learning of Geo-distributed Meteorological Factors

Add code
Jan 04, 2025
Figure 1 for Interpretable Load Forecasting via Representation Learning of Geo-distributed Meteorological Factors
Figure 2 for Interpretable Load Forecasting via Representation Learning of Geo-distributed Meteorological Factors
Figure 3 for Interpretable Load Forecasting via Representation Learning of Geo-distributed Meteorological Factors
Figure 4 for Interpretable Load Forecasting via Representation Learning of Geo-distributed Meteorological Factors
Viaarxiv icon

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Figure 1 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 2 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 3 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 4 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Viaarxiv icon

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Add code
Dec 26, 2024
Viaarxiv icon