Abstract:Large Vision-Language Models (LVLMs) have advanced multimodal learning but face high computational costs due to the large number of visual tokens, motivating token pruning to improve inference efficiency. The key challenge lies in identifying which tokens are truly important. Most existing approaches rely on attention-based criteria to estimate token importance. However, they inherently suffer from certain limitations, such as positional bias. In this work, we explore a new perspective on token importance based on token transitions in LVLMs. We observe that the transition of token representations provides a meaningful signal of semantic information. Based on this insight, we propose TransPrune, a training-free and efficient token pruning method. Specifically, TransPrune progressively prunes tokens by assessing their importance through a combination of Token Transition Variation (TTV)-which measures changes in both the magnitude and direction of token representations-and Instruction-Guided Attention (IGA), which measures how strongly the instruction attends to image tokens via attention. Extensive experiments demonstrate that TransPrune achieves comparable multimodal performance to original LVLMs, such as LLaVA-v1.5 and LLaVA-Next, across eight benchmarks, while reducing inference TFLOPs by more than half. Moreover, TTV alone can serve as an effective criterion without relying on attention, achieving performance comparable to attention-based methods. The code will be made publicly available upon acceptance of the paper at https://github.com/liaolea/TransPrune.
Abstract:Coherent detection can provide enhanced receiver sensitivity and spectral efficiency in free-space optical (FSO) communications. However, turbulence can cause modal power coupling effects on a Gaussian data beam and significantly degrade the mixing efficiency between the data beam and a Gaussian local oscillator (LO) in the coherent detector. Optical phase conjugation (OPC) in a photorefractive crystal can "automatically" mitigate turbulence by: (a) recording a back-propagated turbulence-distorted probe beam, and (b) creating a phase-conjugate beam that has the inverse phase distortion of the medium as the transmitted data beam. However, previously reported crystal-based OPC approaches for FSO links have demonstrated either: (i) a relatively fast response time of 35 ms but at a relatively low data rate (e.g., <1 Mbit/s), or (ii) a relatively high data rate of 2-Gbit/s but at a slow response time (e.g., >60 s). Here, we report an OPC approach for the automatic mitigation of dynamic turbulence that enables both a high data rate (8 Gbit/s) data beam and a rapid (<5 ms) response time. For a similar data rate, this represents a 10,000-fold faster response time than previous reports, thereby enabling mitigation for dynamic effects. In our approach, the transmitted pre-distorted phase-conjugate data beam is generated by four-wave mixing in a GaAs crystal of three input beams: a turbulence-distorted probe beam, a Gaussian reference beam regenerated from the probe beam, and a Gaussian data beam carrying a high-speed data channel. We experimentally demonstrate our approach in an 8-Gbit/s quadrature-phase-shift-keying coherent FSO link through emulated dynamic turbulence. Our results show ~10-dB improvement in the mixing efficiency of the LO with the data beam under dynamic turbulence with a bandwidth of up to ~260 Hz (Greenwood frequency).