Multilingual Large Language Models are capable of using powerful Large Language Models to handle and respond to queries in multiple languages, which achieves remarkable success in multilingual natural language processing tasks. Despite these breakthroughs, there still remains a lack of a comprehensive survey to summarize existing approaches and recent developments in this field. To this end, in this paper, we present a thorough review and provide a unified perspective to summarize the recent progress as well as emerging trends in multilingual large language models (MLLMs) literature. The contributions of this paper can be summarized: (1) First survey: to our knowledge, we take the first step and present a thorough review in MLLMs research field according to multi-lingual alignment; (2) New taxonomy: we offer a new and unified perspective to summarize the current progress of MLLMs; (3) New frontiers: we highlight several emerging frontiers and discuss the corresponding challenges; (4) Abundant resources: we collect abundant open-source resources, including relevant papers, data corpora, and leaderboards. We hope our work can provide the community with quick access and spur breakthrough research in MLLMs.
This paper introduces a cooperative sensing framework designed for integrated sensing and communication cellular networks. The framework comprises one base station (BS) functioning as the sensing transmitter, while several nearby BSs act as sensing receivers. The primary objective is to facilitate cooperative target localization by enabling each receiver to share specific information with a fusion center (FC) over a limited capacity backhaul link. To achieve this goal, we propose an advanced cooperative sensing design that enhances the communication process between the receivers and the FC. Each receiver independently estimates the time delay and the reflecting coefficient associated with the reflected path from the target. Subsequently, each receiver transmits the estimated values and the received signal samples centered around the estimated time delay to the FC. To efficiently quantize the signal samples, a Karhunen-Lo\`eve Transform coding scheme is employed. Furthermore, an optimization problem is formulated to allocate backhaul resources for quantizing different samples, improving target localization. Numerical results validate the effectiveness of our proposed advanced design and demonstrate its superiority over a baseline design, where only the locally estimated values are transmitted from each receiver to the FC.
Hybrid beamforming is vital in modern wireless systems, especially for massive MIMO and millimeter-wave deployments, offering efficient directional transmission with reduced hardware complexity. However, effective beamforming in multi-user scenarios relies heavily on accurate channel state information, the acquisition of which often incurs excessive pilot overhead, degrading system performance. To address this and inspired by the spatial congruence between sub-6GHz (sub-6G) and mmWave channels, we propose a Sub-6G information Aided Multi-User Hybrid Beamforming (SA-MUHBF) framework, avoiding excessive use of pilots. SA-MUHBF employs a convolutional neural network to predict mmWave beamspace from sub-6G channel estimate, followed by a novel multi-layer graph neural network for analog beam selection and a linear minimum mean-square error algorithm for digital beamforming. Numerical results demonstrate that SA-MUHBF efficiently predicts the mmWave beamspace representation and achieves superior spectrum efficiency over state-of-the-art benchmarks. Moreover, SA-MUHBF demonstrates robust performance across varied sub-6G system configurations and exhibits strong generalization to unseen scenarios.
The emergence of novel the dummy data injection attack (DDIA) poses a severe threat to the secure and stable operation of power systems. These attacks are particularly perilous due to the minimal Euclidean spatial separation between the injected malicious data and legitimate data, rendering their precise detection challenging using conventional distance-based methods. Furthermore, existing research predominantly focuses on various machine learning techniques, often analyzing the temporal data sequences post-attack or relying solely on Euclidean spatial characteristics. Unfortunately, this approach tends to overlook the inherent topological correlations within the non-Euclidean spatial attributes of power grid data, consequently leading to diminished accuracy in attack localization. To address this issue, this study takes a comprehensive approach. Initially, it examines the underlying principles of these new DDIAs on power systems. Here, an intricate mathematical model of the DDIA is designed, accounting for incomplete topological knowledge and alternating current (AC) state estimation from an attacker's perspective. Subsequently, by integrating a priori knowledge of grid topology and considering the temporal correlations within measurement data and the topology-dependent attributes of the power grid, this study introduces temporal and spatial attention matrices. These matrices adaptively capture the spatio-temporal correlations within the attacks. Leveraging gated stacked causal convolution and graph wavelet sparse convolution, the study jointly extracts spatio-temporal DDIA features. Finally, the research proposes a DDIA localization method based on spatio-temporal graph neural networks. The accuracy and effectiveness of the DDIA model are rigorously demonstrated through comprehensive analytical cases.
Multi-perspective cameras with potentially non-overlapping fields of view have become an important exteroceptive sensing modality in a number of applications such as intelligent vehicles, drones, and mixed reality headsets. In this work, we challenge one of the basic assumptions made in these scenarios, which is that the multi-camera rig is rigid. More specifically, we are considering the problem of estimating the relative pose between a static non-rigid rig in different spatial orientations while taking into account the effect of gravity onto the system. The deformable physical connections between each camera and the body center are approximated by a simple cantilever model, and inserted into the generalized epipolar constraint. Our results lead us to the important insight that the latent parameters of the deformation model, meaning the gravity vector in both views, become observable. We present a concise analysis of the observability of all variables based on noise, outliers, and rig rigidity for two different algorithms. The first one is a vision-only alternative, while the second one makes use of additional gravity measurements. To conclude, we demonstrate the ability to sense gravity in a real-world example, and discuss practical implications.
Accurate and robust prediction of drug-target interactions (DTIs) plays a vital role in drug discovery. Despite extensive efforts have been invested in predicting novel DTIs, existing approaches still suffer from insufficient labeled data and cold start problems. More importantly, there is currently a lack of studies focusing on elucidating the mechanism of action (MoA) between drugs and targets. Distinguishing the activation and inhibition mechanisms is critical and challenging in drug development. Here, we introduce a unified framework called DTIAM, which aims to predict interactions, binding affinities, and activation/inhibition mechanisms between drugs and targets. DTIAM learns drug and target representations from large amounts of label-free data through self-supervised pre-training, which accurately extracts the substructure and contextual information of drugs and targets, and thus benefits the downstream prediction based on these representations. DTIAM achieves substantial performance improvement over other state-of-the-art methods in all tasks, particularly in the cold start scenario. Moreover, independent validation demonstrates the strong generalization ability of DTIAM. All these results suggested that DTIAM can provide a practically useful tool for predicting novel DTIs and further distinguishing the MoA of candidate drugs. DTIAM, for the first time, provides a unified framework for accurate and robust prediction of drug-target interactions, binding affinities, and activation/inhibition mechanisms.
Recent work in Natural Language Processing and Computer Vision has been using textual information -- e.g., entity names and descriptions -- available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Enhancement (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English languages. More specifically, we: i) bring to light the problem of increasing multilingual coverage and precision of entity names and descriptions in Wikidata; ii) demonstrate that state-of-the-art methods, namely, Machine Translation (MT), Web Search (WS), and Large Language Models (LLMs), struggle with this task; iii) present M-NTA, a novel unsupervised approach that combines MT, WS, and LLMs to generate high-quality textual information; and, iv) study the impact of increasing multilingual coverage and precision of non-English textual information in Entity Linking, Knowledge Graph Completion, and Question Answering. As part of our effort towards better multilingual knowledge graphs, we also introduce WikiKGE-10, the first human-curated benchmark to evaluate KGE approaches in 10 languages across 7 language families.
Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and infrastructure cost savings.
End-to-end task-oriented dialogue (EToD) can directly generate responses in an end-to-end fashion without modular training, which attracts escalating popularity. The advancement of deep neural networks, especially the successful use of large pre-trained models, has further led to significant progress in EToD research in recent years. In this paper, we present a thorough review and provide a unified perspective to summarize existing approaches as well as recent trends to advance the development of EToD research. The contributions of this paper can be summarized: (1) \textbf{\textit{First survey}}: to our knowledge, we take the first step to present a thorough survey of this research field; (2) \textbf{\textit{New taxonomy}}: we first introduce a unified perspective for EToD, including (i) \textit{Modularly EToD} and (ii) \textit{Fully EToD}; (3) \textbf{\textit{New Frontiers}}: we discuss some potential frontier areas as well as the corresponding challenges, hoping to spur breakthrough research in EToD field; (4) \textbf{\textit{Abundant resources}}: we build a public website\footnote{We collect the related papers, baseline projects, and leaderboards for the community at \url{https://etods.net/}.}, where EToD researchers could directly access the recent progress. We hope this work can serve as a thorough reference for the EToD research community.
In this paper, we investigate a self-sensing intelligent reflecting surface (IRS) aided millimeter wave (mmWave) integrated sensing and communication (ISAC) system. Unlike the conventional purely passive IRS, the self-sensing IRS can effectively reduce the path loss of sensing-related links, thus rendering it advantageous in ISAC systems. Aiming to jointly sense the target/scatterer/user positions as well as estimate the sensing and communication (SAC) channels in the considered system, we propose a two-phase transmission scheme, where the coarse and refined sensing/channel estimation (CE) results are respectively obtained in the first phase (using scanning-based IRS reflection coefficients) and second phase (using optimized IRS reflection coefficients). For each phase, an angle-based sensing turbo variational Bayesian inference (AS-TVBI) algorithm, which combines the VBI, messaging passing and expectation-maximization (EM) methods, is developed to solve the considered joint location sensing and CE problem. The proposed algorithm effectively exploits the partial overlapping structured (POS) sparsity and 2-dimensional (2D) block sparsity inherent in the SAC channels to enhance the overall performance. Based on the estimation results from the first phase, we formulate a Cram\'{e}r-Rao bound (CRB) minimization problem for optimizing IRS reflection coefficients, and through proper reformulations, a low-complexity manifold-based optimization algorithm is proposed to solve this problem. Simulation results are provided to verify the superiority of the proposed transmission scheme and associated algorithms.