Next generation communication systems require accurate beam alignment to counteract the impairments that characterize propagation in high-frequency bands. The overhead of the pilot sequences required to select the best beam pair is prohibitive when codebooks contain a large number of beams, as is the case in practice. To remedy this issue, some schemes exploit information about the user location to predict the best beam pair. However, these schemes (i) involve no measurements whatsoever, which generally results in a highly suboptimal predicted beam, and (ii) are not robust to localization errors. To address these limitations, this paper builds upon the notion of radio map to develop two algorithms that attain a balance between the quality of the obtained beam pair and measurement overhead. The proposed algorithms predict the received power corresponding to each pair and measure just the Q pairs with highest prediction. While the first algorithm targets simplicity, the second one relies on a Bayesian approach to endow the prediction process with robustness to localization error. The performance of both algorithms is shown to widely outperform existing methods using ray-tracing data.
Recent advancements in deep learning have led to the development of various models for long-term multivariate time-series forecasting (LMTF), many of which have shown promising results. Generally, the focus has been on historical-value-based models, which rely on past observations to predict future series. Notably, a new trend has emerged with time-index-based models, offering a more nuanced understanding of the continuous dynamics underlying time series. Unlike these two types of models that aggregate the information of spatial domains or temporal domains, in this paper, we consider multivariate time series as spatiotemporal data regularly sampled from a continuous dynamical system, which can be represented by partial differential equations (PDEs), with the spatial domain being fixed. Building on this perspective, we present PDETime, a novel LMTF model inspired by the principles of Neural PDE solvers, following the encoding-integration-decoding operations. Our extensive experimentation across seven diverse real-world LMTF datasets reveals that PDETime not only adapts effectively to the intrinsic spatiotemporal nature of the data but also sets new benchmarks, achieving state-of-the-art results
Previous work has showcased the intriguing capability of large language models (LLMs) in retrieving facts and processing context knowledge. However, only limited research exists on the layer-wise capability of LLMs to encode knowledge, which challenges our understanding of their internal mechanisms. In this paper, we devote the first attempt to investigate the layer-wise capability of LLMs through probing tasks. We leverage the powerful generative capability of ChatGPT to construct probing datasets, providing diverse and coherent evidence corresponding to various facts. We employ $\mathcal V$-usable information as the validation metric to better reflect the capability in encoding context knowledge across different layers. Our experiments on conflicting and newly acquired knowledge show that LLMs: (1) prefer to encode more context knowledge in the upper layers; (2) primarily encode context knowledge within knowledge-related entity tokens at lower layers while progressively expanding more knowledge within other tokens at upper layers; and (3) gradually forget the earlier context knowledge retained within the intermediate layers when provided with irrelevant evidence. Code is publicly available at https://github.com/Jometeorie/probing_llama.
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented via natural language conversations. Yet, LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones. In the field of biomedical research, latest discoveries are key to academic and industrial actors and are obscured by the abundance of an ever-increasing literature corpus (the information overload problem). Surfacing new associations between biomedical entities, e.g., drugs, genes, diseases, with LLMs becomes a challenge of capturing the long-tail knowledge of the biomedical scientific production. To overcome this challenge, Retrieval Augmented Generation (RAG) has been proposed to alleviate some of the shortcomings of LLMs by augmenting the prompts with context retrieved from external datasets. RAG methods typically select the context via maximum similarity search over text embeddings. In this study, we show that RAG methods leave out a significant proportion of relevant information due to clusters of over-represented concepts in the biomedical literature. We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem. Its retrieval performance is about twice better than embedding similarity alternatives on both precision and recall. Finally, we demonstrate that both embedding similarity and knowledge graph retrieval methods can be advantageously combined into a hybrid model that outperforms both, enabling potential improvements to biomedical question-answering models.
Dimensionality reduction can be applied to hyperspectral images so that the most useful data can be extracted and processed more quickly. This is critical in any situation in which data volume exceeds the capacity of the computational resources, particularly in the case of remote sensing platforms (e.g., drones, satellites), but also in the case of multi-year datasets. Moreover, the computational strategies of unsupervised dimensionality reduction often provide the basis for more complicated supervised techniques. Seven unsupervised dimensionality reduction algorithms are tested on hyperspectral data from the HYPSO-1 earth observation satellite. Each particular algorithm is chosen to be representative of a broader collection. The experiments probe the computational complexity, reconstruction accuracy, signal clarity, sensitivity to artifacts, and effects on target detection and classification of the different algorithms. No algorithm consistently outperformed the others across all tests, but some general trends regarding the characteristics of the algorithms did emerge. With half a million pixels, computational time requirements of the methods varied by 5 orders of magnitude, and the reconstruction error varied by about 3 orders of magnitude. A relationship between mutual information and artifact susceptibility was suggested by the tests. The relative performance of the algorithms differed significantly between the target detection and classification tests. Overall, these experiments both show the power of dimensionality reduction and give guidance regarding how to evaluate a technique prior to incorporating it into a processing pipeline.
Although reconfigurable intelligent surface (RIS) is a promising technology for shaping the propagation environment, it consists of a single-layer structure within inherent limitations regarding the number of beam steering patterns. Based on the recently revolutionary technology, denoted as stacked intelligent metasurface (SIM), we propose its implementation not only on the base station (BS) side in a massive multiple-input multiple-output (mMIMO) setup but also in the intermediate space between the base station and the users to adjust the environment further as needed. For the sake of convenience, we call the former BS SIM (BSIM), and the latter channel SIM (CSIM). Hence, we achieve wave-based combining at the BS and wave-based configuration at the intermediate space. Specifically, we propose a channel estimation method with reduced overhead, being crucial for SIMassisted communications. Next, we derive the uplink sum spectral efficiency (SE) in closed form in terms of statistical channel state information (CSI). Notably, we optimize the phase shifts of both BSIM and CSIM simultaneously by using the projected gradient ascent method (PGAM). Compared to previous works on SIMs, we study the uplink transmission, a mMIMO setup, channel estimation in a single phase, a second SIM at the intermediate space, and simultaneous optimization of the two SIMs. Simulation results show the impact of various parameters on the sum SE, and demonstrate the superiority of our optimization approach compared to the alternating optimization (AO) method.
Video conferencing has caught much more attention recently. High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications. Most pioneering methods rely on classic video compression codec without high-level feature embedding and thus can not reach the extremely low bandwidth. Recent works instead employ model-based neural compression to acquire ultra-low bitrates using sparse representations of each frame such as facial landmark information, while these approaches can not maintain high fidelity due to 2D image-based warping. In this paper, we propose a novel low bandwidth neural compression approach for high-fidelity portrait video conferencing using implicit radiance fields to achieve both major objectives. We leverage dynamic neural radiance fields to reconstruct high-fidelity talking head with expression features, which are represented as frame substitution for transmission. The overall system employs deep model to encode expression features at the sender and reconstruct portrait at the receiver with volume rendering as decoder for ultra-low bandwidth. In particular, with the characteristic of neural radiance fields based model, our compression approach is resolution-agnostic, which means that the low bandwidth achieved by our approach is independent of video resolution, while maintaining fidelity for higher resolution reconstruction. Experimental results demonstrate that our novel framework can (1) construct ultra-low bandwidth video conferencing, (2) maintain high fidelity portrait and (3) have better performance on high-resolution video compression than previous works.
Previous work in structured prediction (e.g. NER, information extraction) using single model make use of explicit dataset information, which helps boost in-distribution performance but is orthogonal to robust generalization in real-world situations. To overcome this limitation, we propose the Structured Language Generation Model (SLGM), a framework that reduces sequence-to-sequence problems to classification problems via methodologies in loss calibration and decoding method. Our experimental results show that SLGM is able to maintain performance without explicit dataset information, follow and potentially replace dataset-specific fine-tuning.
In the rapidly evolving field of scientific research, efficiently extracting key information from the burgeoning volume of scientific papers remains a formidable challenge. This paper introduces an innovative framework designed to automate the extraction of vital data from scientific PDF documents, enabling researchers to discern future research trajectories more readily. AutoIE uniquely integrates four novel components: (1) A multi-semantic feature fusion-based approach for PDF document layout analysis; (2) Advanced functional block recognition in scientific texts; (3) A synergistic technique for extracting and correlating information on molecular sieve synthesis; (4) An online learning paradigm tailored for molecular sieve literature. Our SBERT model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE datasets. In addition, a practical application of AutoIE in the petrochemical molecular sieve synthesis domain demonstrates its efficacy, evidenced by an impressive 78\% accuracy rate. This research paves the way for enhanced data management and interpretation in molecular sieve synthesis. It is a valuable asset for seasoned experts and newcomers in this specialized field.
This research explores the potential of Large Language Models (LLMs) to utilize psychometric values, specifically personality information, within the context of video game character development. Affective Computing (AC) systems quantify a Non-Player character's (NPC) psyche, and an LLM can take advantage of the system's information by using the values for prompt generation. The research shows an LLM can consistently represent a given personality profile, thereby enhancing the human-like characteristics of game characters. Repurposing a human examination, the International Personality Item Pool (IPIP) questionnaire, to evaluate an LLM shows that the model can accurately generate content concerning the personality provided. Results show that the improvement of LLM, such as the latest GPT-4 model, can consistently utilize and interpret a personality to represent behavior.