Abstract:Fine-grained attribute prediction is essential for fashion retail applications including catalog enrichment, visual search, and recommendation systems. Vision-Language Models (VLMs) offer zero-shot prediction without task-specific training, yet their systematic evaluation on multi-attribute fashion tasks remains underexplored. A key challenge is that fashion attributes are often conditional. For example, "outer fabric" is undefined when no outer garment is visible. This requires models to detect attribute applicability before attempting classification. We introduce a three-tier evaluation framework that decomposes this challenge: (1) overall task performance across all classes (including NA class: suggesting attribute is not applicable) for all attributes, (2) attribute applicability detection, and (3) fine-grained classification when attributes are determinable. Using DeepFashion-MultiModal, which explicitly defines NA (meaning attribute doesn't exist or is not visible) within attribute label spaces, we benchmark nine VLMs spanning flagship (GPT-5, Gemini 2.5 Pro), efficient (GPT-5 Mini, Gemini 2.5 Flash), and ultra-efficient tiers (GPT-5 Nano, Gemini 2.5 Flash-Lite) against classifiers trained on pretrained Fashion-CLIP embeddings on 5,000 images across 18 attributes. Our findings reveal that: (1) zero-shot VLMs achieve 64.0% macro-F1, a threefold improvement over logistic regression on pretrained Fashion-CLIP embeddings; (2) VLMs excel at fine-grained classification (Tier 3: 70.8% F1) but struggle with applicability detection (Tier 2: 34.1% NA-F1), identifying a key bottleneck; (3) efficient models achieve over 90% of flagship performance at lower cost, offering practical deployment paths. This diagnostic framework enables practitioners to pinpoint whether errors stem from visibility detection or classification, guiding targeted improvements for production systems.




Abstract:Sampling based probabilistic roadmap planners (PRM) have been successful in motion planning of robots with higher degrees of freedom, but may fail to capture the connectivity of the configuration space in scenarios with a critical narrow passage. In this paper, we show a novel technique based on Levy Flights to generate key samples in the narrow regions of configuration space, which, when combined with a PRM, improves the completeness of the planner. The technique substantially improves sample quality at the expense of a minimal additional computation, when compared with pure random walk based methods, however, still outperforms state of the art random bridge building method, in terms of number of collision calls, computational overhead and sample quality. The method is robust to the changes in the parameters related to the structure of the narrow passage, thus giving an additional generality. A number of 2D & 3D motion planning simulations are presented which shows the effectiveness of the method.