Abstract:The rapid advancement of the next generation of communications and internet of things (IoT) technologies has made the provision of location-based services for diverse devices an increasingly pressing necessity. Localizing devices with/without intelligent computing abilities, including both active and passive devices is essential, especially in indoor scenarios. For traditional RF positioning systems, aligning transmission signals and dealing with signal interference in complex environments are inevitable challenges. Therefore, this paper proposed a new passive positioning system, the RF-band resonant beam positioning system (RF-RBPS), which achieves energy concentration and beam alignment by amplifying echoes between the base station (BS) and the passive target (PT), without the need for complex channel estimation and time-consuming beamforming and provides high-precision direction of arrival (DoA) estimation for battery-free targets using the resonant mechanism. The direction information of the PT is estimated using the multiple signal classification (MUSIC) algorithm at the end of BS. The feasibility of the proposed system is validated through theoretical analysis and simulations. Results indicate that the proposed RF-RBPS surpasses RF-band active positioning system (RF-APS) in precision, achieving millimeter-level precision at 2m within an elevation angle of 35$^\circ$, and an error of less than 3cm at 2.5m within an elevation angle of 35$^\circ$.
Abstract:Simultaneous wireless information and power transfer (SWIPT) leverages lightwave as the wireless transmission medium, emerging as a promising technology in the future Internet of Things (IoT) scenarios. The use of retro-reflectors in constructing spatially separated laser resonators (SSLR) enables a self-aligning wireless transmission system with the self-reproducing resonant beam, i.e. resonant beam system (RBS). However, it's effective Field of View (FoV) is physically limited by the size of retroreflectors and still requires significant improvement. This restricts the transmitter from providing seamless wireless connectivity and power supply to receivers within a large dynamic movement range. In this paper, we propose an FoV-enlarged resonant beam system operating at a meter distance by incorporating a telescope. The telescope plays a crucial role in minimizing the extra loss inflicted on the gain medium, which typically arises from the deviation of the resonant beam within the cavity. Further, we construct the proposed telescope-based RBS and experimentally demonstrate that the design could expand the FoV to 28$^\circ$ over 1 m transmission distance is about triple that of the ordinary RBS design.
Abstract:Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information. In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens. Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training strategies, including supervised fine-tuning and reinforcement learning. Further, NoiseBoost pioneerly enables semi-supervised learning for MLLMs, unleashing the power of unlabeled data. Comprehensive experiments demonstrate that NoiseBoost improves dense caption accuracy by 8.1% with human evaluation and achieves comparable results with 50% of the data by mining unlabeled data. Code and models are available at https://kaiwu5.github.io/noiseboost.
Abstract:This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission rate of the considered system under both the quasi-static and mobile scenarios. Part I of this paper focuses on the quasi-static scenario where the locations of the transmitter and the receiver are relatively fixed. Specifically, we propose a new information-bearing scheme which adopts a synchronization-based amplitude modulation method to mitigate the echo interference caused by the reflected resonant beam. With this scheme, we show that the quasi-static RBCom channel is equivalent to a Markov channel and can be further simplified as an amplitude-constrained additive white Gaussian noise channel. Moreover, we develop an algorithm that jointly employs the bisection and exhaustive search to maximize its capacity upper and lower bounds. Finally, numerical results validate our analysis. Part II of this paper discusses the performance of the RBCom system under the mobile scenario.
Abstract:Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interference generated by the modulated beam reflected from the receiver affects the transmission of the desired information. To tackle this challenge, a synchronization-based point-to-point RBCom system is proposed to eliminate the echo interference, and the design for the transmitter and receiver is discussed. Subsequently, the performance of the proposed RBCom is evaluated and compared with that of visible light communications (VLC) and free space optical communications (FOC). Finally, future research directions are outlined and several implementation challenges of RBCom systems are highlighted.
Abstract:This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based information transmission scheme and derives the capacity upper and lower bounds for the quasi-static channel case. In Part II, we address the mobile scenario, where the receiver is in relative motion to the transmitter, and derive a mobile RBCom channel model that jointly considers the Doppler effect, channel variation, and echo interference. With the obtained channel model, we prove that the channel gain of the mobile RBCom decreases as the number of transmitted frames increases, and thus show that the considered mobile RBCom terminates after the transmitter sends a certain number of frames without frequency compensation. By deriving an upper bound on the number of successfully transmitted frames, we formulate the throughput maximization problem for the considered mobile RBCom system, and solve it via a sequential parametric convex approximation (SPCA) method. Finally, simulation results validate the analysis of our proposed method in some typical scenarios.
Abstract:Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. Most existing multi-modal benchmarks require task-oriented input-output formats, posing great challenges to automatically assess the free-form text output of LVLMs. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formats. Through systematic data collection and reformulation, we present the ReForm-Eval benchmark, offering substantial data for evaluating various capabilities of LVLMs. Based on ReForm-Eval, we conduct extensive experiments, thoroughly analyze the strengths and weaknesses of existing LVLMs, and identify the underlying factors. Our benchmark and evaluation framework will be open-sourced as a cornerstone for advancing the development of LVLMs.
Abstract:In e-commerce search, personalized retrieval is a crucial technique for improving user shopping experience. Recent works in this domain have achieved significant improvements by the representation learning paradigm, e.g., embedding-based retrieval (EBR) and collaborative filtering (CF). EBR methods do not sufficiently exploit the useful collaborative signal and are difficult to learn the representations of long-tail item well. Graph-based CF methods improve personalization by modeling collaborative signal within the user click graph. However, existing Graph-based methods ignore user's multiple behaviours, such as click/purchase and the relevance constraint between user behaviours and items.In this paper, we propose a Graph Contrastive Learning with Multi-Objective (GCL-MO) collaborative filtering model, which solves the problems of weak relevance and incomplete personalization in e-commerce search. Specifically, GCL-MO builds a homogeneous graph of items and then optimizes a multi-objective function of personalization and relevance. Moreover, we propose a modified contrastive loss for multi-objectives graph learning, which avoids the mutual suppression among positive samples and thus improves the generalization and robustness of long-tail item representations. These learned item embeddings are then used for personalized retrieval by constructing an efficient offline-to-online inverted table. GCL-MO outperforms the online collaborative filtering baseline in both offline/online experimental metrics and shows a significant improvement in the online A/B testing of Taobao search.
Abstract:With the development of the multi-media internet, visual characteristics have become an important factor affecting user interests. Thus, incorporating visual features is a promising direction for further performance improvements in click-through rate (CTR) prediction. However, we found that simply injecting the image embeddings trained with established pre-training methods only has marginal improvements. We attribute the failure to two reasons: First, The pre-training methods are designed for well-defined computer vision tasks concentrating on semantic features, and they cannot learn personalized interest in recommendations. Secondly, pre-trained image embeddings only containing semantic information have little information gain, considering we already have semantic features such as categories and item titles as inputs in the CTR prediction task. We argue that a pre-training method tailored for recommendation is necessary for further improvements. To this end, we propose a recommendation-aware image pre-training method that can learn visual features from user click histories. Specifically, we propose a user interest reconstruction module to mine visual features related to user interests from behavior histories. We further propose a contrastive training method to avoid collapsing of embedding vectors. We conduct extensive experiments to verify that our method can learn users' visual interests, and our method achieves $0.46\%$ improvement in offline AUC and $0.88\%$ improvement in Taobao online GMV with p-value$<0.01$.
Abstract:E-commerce search systems such as Taobao Search, the largest e-commerce searching system in China, aim at providing users with the most preferred items (e.g., products). Due to the massive data and limited time for response, a typical industrial ranking system consists of three or more modules, including matching, pre-ranking, and ranking. The pre-ranking is widely considered a mini-ranking module, as it needs to rank hundreds of times more items than the ranking under limited latency. Existing researches focus on building a lighter model that imitates the ranking model. As such, the metric of a pre-ranking model follows the ranking model using Area Under ROC (AUC) for offline evaluation. However, such a metric is inconsistent with online A/B tests in practice, so engineers have to perform costly online tests to reach a convincing conclusion. In our work, we rethink the role of the pre-ranking. We argue that the primary goal of the pre-ranking stage is to return an optimal unordered set rather than an ordered list of items because it is the ranking that determines the final exposures. Since AUC measures the quality of an ordered item list, it is not suitable for evaluating the quality of the output unordered set. This paper proposes a new evaluation metric called All-Scenario Hitrate (ASH) for pre-ranking. ASH is proven effective in the offline evaluation and consistent with online A/B tests based on numerous experiments in Taobao Search. We also introduce an all-scenario-based multi-objective learning framework (ASMOL), which improves the ASH significantly. Surprisingly, the new pre-ranking model can outperforms the ranking model when outputting thousands of items. The phenomenon validates that the pre-ranking stage should not imitate the ranking blindly. With the improvements in ASH consistently translating to online improvement, it makes a 1.2% GMV improvement on Taobao Search.