Visible-Infrared Person Re-Identification (VI-ReID) is a challenging retrieval task due to the substantial modality gap between visible and infrared images. While existing methods attempt to bridge this gap by learning modality-invariant features within a shared embedding space, they often overlook the complex and implicit correlations between modalities. This limitation becomes more severe under distribution shifts, where infrared samples are often far fewer than visible ones. To address these challenges, we propose a novel network termed Bi-directional Interaction Transformation (BIT). Instead of relying on rigid feature alignment, BIT adopts a matching-based strategy that explicitly models the interaction between visible and infrared image pairs. Specifically, BIT employs an encoder-decoder architecture where the encoder extracts preliminary feature representations, and the decoder performs bi-directional feature integration and query aware scoring to enhance cross-modality correspondence. To our best knowledge, BIT is the first to introduce such pairwise matching-driven interaction in VI-ReID. Extensive experiments on several benchmarks demonstrate that our BIT achieves state-of-the-art performance, highlighting its effectiveness in the VI-ReID task.