Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interference generated by the modulated beam reflected from the receiver affects the transmission of the desired information. To tackle this challenge, a synchronization-based point-to-point RBCom system is proposed to eliminate the echo interference, and the design for the transmitter and receiver is discussed. Subsequently, the performance of the proposed RBCom is evaluated and compared with that of visible light communications (VLC) and free space optical communications (FOC). Finally, future research directions are outlined and several implementation challenges of RBCom systems are highlighted.
This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based information transmission scheme and derives the capacity upper and lower bounds for the quasi-static channel case. In Part II, we address the mobile scenario, where the receiver is in relative motion to the transmitter, and derive a mobile RBCom channel model that jointly considers the Doppler effect, channel variation, and echo interference. With the obtained channel model, we prove that the channel gain of the mobile RBCom decreases as the number of transmitted frames increases, and thus show that the considered mobile RBCom terminates after the transmitter sends a certain number of frames without frequency compensation. By deriving an upper bound on the number of successfully transmitted frames, we formulate the throughput maximization problem for the considered mobile RBCom system, and solve it via a sequential parametric convex approximation (SPCA) method. Finally, simulation results validate the analysis of our proposed method in some typical scenarios.
This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission rate of the considered system under both the quasi-static and mobile scenarios. Part I of this paper focuses on the quasi-static scenario where the locations of the transmitter and the receiver are relatively fixed. Specifically, we propose a new information-bearing scheme which adopts a synchronization-based amplitude modulation method to mitigate the echo interference caused by the reflected resonant beam. With this scheme, we show that the quasi-static RBCom channel is equivalent to a Markov channel and can be further simplified as an amplitude-constrained additive white Gaussian noise channel. Moreover, we develop an algorithm that jointly employs the bisection and exhaustive search to maximize its capacity upper and lower bounds. Finally, numerical results validate our analysis. Part II of this paper discusses the performance of the RBCom system under the mobile scenario.
Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. Most existing multi-modal benchmarks require task-oriented input-output formats, posing great challenges to automatically assess the free-form text output of LVLMs. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formats. Through systematic data collection and reformulation, we present the ReForm-Eval benchmark, offering substantial data for evaluating various capabilities of LVLMs. Based on ReForm-Eval, we conduct extensive experiments, thoroughly analyze the strengths and weaknesses of existing LVLMs, and identify the underlying factors. Our benchmark and evaluation framework will be open-sourced as a cornerstone for advancing the development of LVLMs.
In e-commerce search, personalized retrieval is a crucial technique for improving user shopping experience. Recent works in this domain have achieved significant improvements by the representation learning paradigm, e.g., embedding-based retrieval (EBR) and collaborative filtering (CF). EBR methods do not sufficiently exploit the useful collaborative signal and are difficult to learn the representations of long-tail item well. Graph-based CF methods improve personalization by modeling collaborative signal within the user click graph. However, existing Graph-based methods ignore user's multiple behaviours, such as click/purchase and the relevance constraint between user behaviours and items.In this paper, we propose a Graph Contrastive Learning with Multi-Objective (GCL-MO) collaborative filtering model, which solves the problems of weak relevance and incomplete personalization in e-commerce search. Specifically, GCL-MO builds a homogeneous graph of items and then optimizes a multi-objective function of personalization and relevance. Moreover, we propose a modified contrastive loss for multi-objectives graph learning, which avoids the mutual suppression among positive samples and thus improves the generalization and robustness of long-tail item representations. These learned item embeddings are then used for personalized retrieval by constructing an efficient offline-to-online inverted table. GCL-MO outperforms the online collaborative filtering baseline in both offline/online experimental metrics and shows a significant improvement in the online A/B testing of Taobao search.
With the development of the multi-media internet, visual characteristics have become an important factor affecting user interests. Thus, incorporating visual features is a promising direction for further performance improvements in click-through rate (CTR) prediction. However, we found that simply injecting the image embeddings trained with established pre-training methods only has marginal improvements. We attribute the failure to two reasons: First, The pre-training methods are designed for well-defined computer vision tasks concentrating on semantic features, and they cannot learn personalized interest in recommendations. Secondly, pre-trained image embeddings only containing semantic information have little information gain, considering we already have semantic features such as categories and item titles as inputs in the CTR prediction task. We argue that a pre-training method tailored for recommendation is necessary for further improvements. To this end, we propose a recommendation-aware image pre-training method that can learn visual features from user click histories. Specifically, we propose a user interest reconstruction module to mine visual features related to user interests from behavior histories. We further propose a contrastive training method to avoid collapsing of embedding vectors. We conduct extensive experiments to verify that our method can learn users' visual interests, and our method achieves $0.46\%$ improvement in offline AUC and $0.88\%$ improvement in Taobao online GMV with p-value$<0.01$.
E-commerce search systems such as Taobao Search, the largest e-commerce searching system in China, aim at providing users with the most preferred items (e.g., products). Due to the massive data and limited time for response, a typical industrial ranking system consists of three or more modules, including matching, pre-ranking, and ranking. The pre-ranking is widely considered a mini-ranking module, as it needs to rank hundreds of times more items than the ranking under limited latency. Existing researches focus on building a lighter model that imitates the ranking model. As such, the metric of a pre-ranking model follows the ranking model using Area Under ROC (AUC) for offline evaluation. However, such a metric is inconsistent with online A/B tests in practice, so engineers have to perform costly online tests to reach a convincing conclusion. In our work, we rethink the role of the pre-ranking. We argue that the primary goal of the pre-ranking stage is to return an optimal unordered set rather than an ordered list of items because it is the ranking that determines the final exposures. Since AUC measures the quality of an ordered item list, it is not suitable for evaluating the quality of the output unordered set. This paper proposes a new evaluation metric called All-Scenario Hitrate (ASH) for pre-ranking. ASH is proven effective in the offline evaluation and consistent with online A/B tests based on numerous experiments in Taobao Search. We also introduce an all-scenario-based multi-objective learning framework (ASMOL), which improves the ASH significantly. Surprisingly, the new pre-ranking model can outperforms the ranking model when outputting thousands of items. The phenomenon validates that the pre-ranking stage should not imitate the ranking blindly. With the improvements in ASH consistently translating to online improvement, it makes a 1.2% GMV improvement on Taobao Search.
E-commerce search engines comprise a retrieval phase and a ranking phase, where the first one returns a candidate product set given user queries. Recently, vision-language pre-training, combining textual information with visual clues, has been popular in the application of retrieval tasks. In this paper, we propose a novel V+L pre-training method to solve the retrieval problem in Taobao Search. We design a visual pre-training task based on contrastive learning, outperforming common regression-based visual pre-training tasks. In addition, we adopt two negative sampling schemes, tailored for the large-scale retrieval task. Besides, we introduce the details of the online deployment of our proposed method in real-world situations. Extensive offline/online experiments demonstrate the superior performance of our method on the retrieval task. Our proposed method is employed as one retrieval channel of Taobao Search and serves hundreds of millions of users in real time.