Picture for Karan Dua

Karan Dua

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data

Add code
Jan 25, 2026
Viaarxiv icon

FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models

Add code
Oct 02, 2025
Figure 1 for FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Figure 2 for FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Figure 3 for FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Figure 4 for FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Viaarxiv icon