Abstract:Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommender System (DA-MRS). To mitigate multi-modal noise, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities. To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss. Furthermore, DA-MRS implements Alignment guided by User preference to enhance task-specific item representation and Alignment guided by graded Item relations to provide finer-grained alignment. Extensive experiments verify that DA-MRS is a plug-and-play framework and achieves significant and consistent improvements across various datasets, backbone models, and noisy scenarios.
Abstract:Although Retrieval-Augmented Large Language Models (RALMs) demonstrate their superiority in terms of factuality, they do not consistently outperform the original retrieval-free Language Models (LMs). Our experiments reveal that this example-level performance inconsistency exists not only between retrieval-augmented and retrieval-free LM but also among different retrievers. To understand this phenomenon, we investigate the degeneration behavior of RALMs and theoretically decompose it into four categories. Further analysis based on our decomposition reveals that the innate difference in knowledge sources and the unpredictable degeneration of the reader model contribute most to the inconsistency. Drawing from our analysis, we introduce Ensemble of Retrievers (EoR), a trainable framework that can adaptively retrieve from different knowledge sources and effectively decrease unpredictable reader errors. Our experiments on Open Domain Question Answering show that EoR substantially improves performance over the RALM with a single retriever by considerably reducing inconsistent behaviors.
Abstract:Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classifying 2D echocardiography data into five distinct echocardiographic views: apical 4-chamber, parasternal long axis of left ventricle, parasternal short axis at levels of the mitral valve, papillary muscle, and apex. It then extracts features of each view separately and combines five features for disease classification. A total of 212 patients diagnosed with HCM, and 30 patients diagnosed with CA, along with 200 individuals with normal cardiac function(Normal), were enrolled in this study from 2018 to 2022. This approach achieved a precision, recall of 0.905, and micro-F1 score of 0.904, demonstrating its effectiveness in accurately identifying HCM and CA using a multi-view analysis.
Abstract:Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimization. To demonstrate the efficacy of our method, we test it on real-world text summarization data with annotated preferences and annotator information. We fine-tune GPT-J 6B to obtain personalized language (and reward) models, which outperform non-personalized models in terms of aligning with individual preferences.
Abstract:Passive human sensing using wireless signals has attracted increasing attention due to its superiorities of non-contact and robustness in various lighting conditions. However, when multiple human individuals are present, their reflected signals could be intertwined in the time, frequency and spatial domains, making it challenging to separate them. To address this issue, this paper proposes a novel system for multiperson detection and monitoring of vital signs (i.e., respiration and heartbeat) with the assistance of space-time-coding (STC) reconfigurable intelligent metasurfaces (RISs). Specifically, the proposed system scans the area of interest (AoI) for human detection by using the harmonic beams generated by the STC RIS. Simultaneously, frequencyorthogonal beams are assigned to each detected person for accurate estimation of their respiration rate (RR) and heartbeat rate (HR). Furthermore, to efficiently extract the respiration signal and the much weaker heartbeat signal, we propose an improved variational mode decomposition (VMD) algorithm to accurately decompose the complex reflected signals into a smaller number of intrinsic mode functions (IMFs). We build a prototype to validate the proposed multiperson detection and vital-sign monitoring system. Experimental results demonstrate that the proposed system can simultaneously monitor the vital signs of up to four persons. The errors of RR and HR estimation using the improved VMD algorithm are below 1 RPM (respiration per minute) and 5 BPM (beats per minute), respectively. Further analysis reveals that the flexible beam controlling mechanism empowered by the STC RIS can reduce the noise reflected from other irrelative objects on the physical layer, and improve the signal-to-noise ratio of echoes from the human chest.
Abstract:Currently, Amazon relies on third parties for transportation marketplace rate forecasts, despite the poor quality and lack of interpretability of these forecasts. While transportation marketplace rates are typically very challenging to forecast accurately, we have developed a novel signature-based statistical technique to address these challenges and built a predictive and adaptive model to forecast marketplace rates. This novel technique is based on two key properties of the signature transform. The first is its universal nonlinearity which linearizes the feature space and hence translates the forecasting problem into a linear regression analysis; the second is the signature kernel which allows for comparing computationally efficiently similarities between time series data. Combined, these properties allow for efficient feature generation and more precise identification of seasonality and regime switching in the forecasting process. Preliminary result by the model shows that this new technique leads to far superior forecast accuracy versus commercially available industry models with better interpretability, even during the period of Covid-19 and with the sudden onset of the Ukraine war.
Abstract:This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proven to converge linearly in finding optimal policies of the regularized LQC. Moreover, the IPO method can achieve a super-linear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy for an RL problem with a known environment is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to enable a super-linear convergence rate if the two environments are sufficiently close. Performances of these proposed algorithms are supported by numerical examples.
Abstract:Reconfigurable intelligent surfaces (RISs) have flexible and exceptional performance in manipulating electromagnetic waves and customizing wireless channels. These capabilities enable them to provide a plethora of valuable activity-related information for promoting wireless human sensing. In this article, we present a comprehensive review of passive human sensing using radio frequency signals with the assistance of RISs. Specifically, we first introduce fundamental principles and physical platform of RISs. Subsequently, based on the specific applications, we categorize the state-of-the-art human sensing techniques into three types, including human imaging,localization, and activity recognition. Meanwhile, we would also investigate the benefits that RISs bring to these applications. Furthermore, we explore the application of RISs in human micro-motion sensing, and propose a vital signs monitoring system enhanced by RISs. Experimental results are presented to demonstrate the promising potential of RISs in sensing vital signs for manipulating individuals. Finally, we discuss the technical challenges and opportunities in this field.
Abstract:Most existing graph visualization methods based on dimension reduction are limited to relatively small graphs due to performance issues. In this work, we propose a novel dimension reduction method for graph visualization, called t-Distributed Stochastic Graph Neighbor Embedding (t-SGNE). t-SGNE is specifically designed to visualize cluster structures in the graph. As a variant of the standard t-SNE method, t-SGNE avoids the time-consuming computations of pairwise similarity. Instead, it uses the neighbor structures of the graph to reduce the time complexity from quadratic to linear, thus supporting larger graphs. In addition, to suit t-SGNE, we combined Laplacian Eigenmaps with the shortest path algorithm in graphs to form the graph embedding algorithm ShortestPath Laplacian Eigenmaps Embedding (SPLEE). Performing SPLEE to obtain a high-dimensional embedding of the large-scale graph and then using t-SGNE to reduce its dimension for visualization, we are able to visualize graphs with up to 300K nodes and 1M edges within 5 minutes and achieve approximately 10% improvement in visualization quality. Codes and data are available at https://github.com/Charlie-XIAO/embedding-visualization-test.
Abstract:Several recent works have directly extended the image masked autoencoder (MAE) with random masking into video domain, achieving promising results. However, unlike images, both spatial and temporal information are important for video understanding. This suggests that the random masking strategy that is inherited from the image MAE is less effective for video MAE. This motivates the design of a novel masking algorithm that can more efficiently make use of video saliency. Specifically, we propose a motion-guided masking algorithm (MGM) which leverages motion vectors to guide the position of each mask over time. Crucially, these motion-based correspondences can be directly obtained from information stored in the compressed format of the video, which makes our method efficient and scalable. On two challenging large-scale video benchmarks (Kinetics-400 and Something-Something V2), we equip video MAE with our MGM and achieve up to +$1.3\%$ improvement compared to previous state-of-the-art methods. Additionally, our MGM achieves equivalent performance to previous video MAE using up to $66\%$ fewer training epochs. Lastly, we show that MGM generalizes better to downstream transfer learning and domain adaptation tasks on the UCF101, HMDB51, and Diving48 datasets, achieving up to +$4.9\%$ improvement compared to baseline methods.