Image


Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor

Add code
Jul 09, 2025
Viaarxiv icon

4KAgent: Agentic Any Image to 4K Super-Resolution

Add code
Jul 09, 2025
Viaarxiv icon

Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

Add code
Jul 09, 2025
Viaarxiv icon

Evaluating Attribute Confusion in Fashion Text-to-Image Generation

Add code
Jul 09, 2025
Viaarxiv icon

Reading a Ruler in the Wild

Add code
Jul 09, 2025
Viaarxiv icon

Evaluating Large Multimodal Models for Nutrition Analysis: A Benchmark Enriched with Contextual Metadata

Add code
Jul 09, 2025
Viaarxiv icon

Integrating Pathology Foundation Models and Spatial Transcriptomics for Cellular Decomposition from Histology Images

Add code
Jul 09, 2025
Viaarxiv icon

Deep Brain Net: An Optimized Deep Learning Model for Brain tumor Detection in MRI Images Using EfficientNetB0 and ResNet50 with Transfer Learning

Add code
Jul 09, 2025
Viaarxiv icon

GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning

Add code
Jul 09, 2025
Viaarxiv icon

Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients

Add code
Jul 09, 2025
Viaarxiv icon