Picture for Baharan Mirzasoleiman

Baharan Mirzasoleiman

Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap

Add code
May 30, 2025
Viaarxiv icon

Do We Need All the Synthetic Data? Towards Targeted Synthetic Image Augmentation via Diffusion Models

Add code
May 27, 2025
Viaarxiv icon

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Add code
May 19, 2025
Viaarxiv icon

Synthetic Text Generation for Training Large Language Models via Gradient Matching

Add code
Feb 24, 2025
Viaarxiv icon

Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

Add code
Feb 20, 2025
Viaarxiv icon

MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation

Add code
Jan 07, 2025
Figure 1 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 2 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 3 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Figure 4 for MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
Viaarxiv icon

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

Add code
Oct 03, 2024
Viaarxiv icon

Memory-efficient Training of LLMs with Larger Mini-batches

Add code
Jul 28, 2024
Viaarxiv icon

Make the Most of Your Data: Changing the Training Data Distribution to Improve In-distribution Generalization Performance

Add code
Apr 27, 2024
Viaarxiv icon

Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity

Add code
Mar 20, 2024
Viaarxiv icon