Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aritra Bandyopadhyay

RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?

May 22, 2026

Arijit Ghosh, Aritra Bandyopadhyay, Chiranjeev Bindra, Jingfen Qiao

Abstract:Multimodal alignment is critical for bridging the semantic gap in information retrieval. However, traditional pairwise strategies introduce a geometric blind spot: while they align anchor modalities (e.g., text) with others, they lack constraints to enforce mutual consistency between peripheral modalities (e.g., video and audio). The TRIANGLE framework addresses this by minimizing the area of modality triplets on a hypersphere to enforce holistic alignment. In this reproducibility study, we verify the robustness of this geometric objective for retrieval tasks. We confirm that TRIANGLE outperforms pairwise baselines in zero-shot settings, achieving Recall@1 gains of up to +8.7 points, though benefits are domain-dependent. However, we fail to reproduce the reported learning-from-scratch results. Analysis using a synthetic toy dataset attributes this to instability when jointly optimizing geometric alignment with Data-Text Matching (DTM) loss. Furthermore, we find that cosine regularization primarily stabilizes text-to-video retrieval, and fine-tuning with domain supervision amplifies geometric benefits but reduces cross-dataset generalization. Our findings support the efficacy of geometric alignment while highlighting critical optimization sensitivities. Code available at https://github.com/ARIJIT00171/RE-TRIANGLE.

Via

Access Paper or Ask Questions

Differential Evolution Algorithm based Hyper-Parameters Selection of Convolutional Neural Network for Speech Command Recognition

Oct 13, 2023

Sandipan Dhar, Anuvab Sen, Aritra Bandyopadhyay, Nanda Dulal Jana, Arjun Ghosh, Zahra Sarayloo

Abstract:Speech Command Recognition (SCR), which deals with identification of short uttered speech commands, is crucial for various applications, including IoT devices and assistive technology. Despite the promise shown by Convolutional Neural Networks (CNNs) in SCR tasks, their efficacy relies heavily on hyper-parameter selection, which is typically laborious and time-consuming when done manually. This paper introduces a hyper-parameter selection method for CNNs based on the Differential Evolution (DE) algorithm, aiming to enhance performance in SCR tasks. Training and testing with the Google Speech Command (GSC) dataset, the proposed approach showed effectiveness in classifying speech commands. Moreover, a comparative analysis with Genetic Algorithm based selections and other deep CNN (DCNN) models highlighted the efficiency of the proposed DE algorithm in hyper-parameter selection for CNNs in SCR tasks.

* 8 Pages, 7 Figures, 4 Tables, Accepted by the 15th International Joint Conference on Computational Intelligence (IJCCI 2023), November 13-15, 2023, Rome, Italy

Via

Access Paper or Ask Questions

Countering Inconsistent Labelling by Google's Vision API for Rotated Images

Nov 17, 2019

Aman Apte, Aritra Bandyopadhyay, K Akhilesh Shenoy, Jason Peter Andrews, Aditya Rathod, Manish Agnihotri, Aditya Jajodia

Figure 1 for Countering Inconsistent Labelling by Google's Vision API for Rotated Images

Figure 2 for Countering Inconsistent Labelling by Google's Vision API for Rotated Images

Figure 3 for Countering Inconsistent Labelling by Google's Vision API for Rotated Images

Figure 4 for Countering Inconsistent Labelling by Google's Vision API for Rotated Images

Abstract:Google's Vision API analyses images and provides a variety of output predictions, one such type is context-based labelling. In this paper, it is shown that adversarial examples that cause incorrect label prediction and spoofing can be generated by rotating the images. Due to the black-boxed nature of the API, a modular context-based pre-processing pipeline is proposed consisting of a Res-Net50 model, that predicts the angle by which the image must be rotated to correct its orientation. The pipeline successfully performs the correction whilst maintaining the image's resolution and feeds it to the API which generates labels similar to the original correctly oriented image and using a Percentage Error metric, the performance of the corrected images as compared to its rotated counter-parts is found to be significantly higher. These observations imply that the API can benefit from such a pre-processing pipeline to increase robustness to rotational perturbances.

* 11 pages, 9 figures, Accepted at ICICV 2020 Jaipur India

Via

Access Paper or Ask Questions