Abstract:Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.
Abstract:Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to begin our exploration within the Minecraft environment. Our STEVE Series agents can complete basic tasks in a virtual environment and more challenging tasks such as navigation and even creative tasks, with an efficiency far exceeding previous state-of-the-art methods by a factor of $2.5\times$ to $7.3\times$. We begin our exploration with a vanilla large language model, augmenting it with a vision encoder and an action codebase trained on our collected high-quality dataset STEVE-21K. Subsequently, we enhanced it with a Critic and memory to transform it into a complex system. Finally, we constructed a hierarchical multi-agent system. Our recent work explored how to prune the agent system through knowledge distillation. In the future, we will explore more potential applications of STEVE agents in the real world.
Abstract:Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In this paper, we introduce Hawk, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, Hawk explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to improve the interpretation of motion-to-language, we establish a clear supervisory relationship between motion and its linguistic representation. Furthermore, we have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions. The final results demonstrate that Hawk achieves SOTA performance, surpassing existing baselines in both video description generation and question-answering. Our codes/dataset/demo will be released at https://github.com/jqtangust/hawk.
Abstract:In response to the COVID-19 pandemic, the integration of interpretable machine learning techniques has garnered significant attention, offering transparent and understandable insights crucial for informed clinical decision making. This literature review delves into the applications of interpretable machine learning in predicting the prognosis of respiratory diseases, particularly focusing on COVID-19 and its implications for future research and clinical practice. We reviewed various machine learning models that are not only capable of incorporating existing clinical domain knowledge but also have the learning capability to explore new information from the data. These models and experiences not only aid in managing the current crisis but also hold promise for addressing future disease outbreaks. By harnessing interpretable machine learning, healthcare systems can enhance their preparedness and response capabilities, thereby improving patient outcomes and mitigating the impact of respiratory diseases in the years to come.
Abstract:The latest TypeII codebook selects partial strongest angular-delay ports for the feedback of downlink channel state information (CSI), whereas its performance is limited due to the deficiency of utilizing the correlations among the port coefficients. To tackle this issue, we propose a tailored autoencoder named TypeII-CsiNet to effectively integrate the TypeII codebook with deep learning, wherein three novel designs are developed for sufficiently boosting the sum rate performance. Firstly, a dedicated pre-processing module is designed to sort the selected ports for reserving the correlations of their corresponding coefficients. Secondly, a position-filling layer is developed in the decoder to fill the feedback coefficients into their ports in the recovered CSI matrix, so that the corresponding angular-delay-domain structure is adequately leveraged to enhance the reconstruction accuracy. Thirdly, a two-stage loss function is proposed to improve the sum rate performance while avoiding the trapping in local optimums during model training. Simulation results verify that our proposed TypeII-CsiNet outperforms the TypeII codebook and existing deep learning benchmarks.
Abstract:With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output executable actions. Nowadays, Multi-modal Language Models~(MLMs) integrate multi-modal signals into LLMs, further bringing richer perception to entity agents and allowing embodied agents to perceive world-understanding tasks more delicately. However, existing works: 1) operate independently by agents, each containing multiple LLMs, from perception to action, resulting in gaps between complex tasks and execution; 2) train MLMs on static data, struggling with dynamics in open-ended scenarios; 3) input prior knowledge directly as prompts, suppressing application flexibility. We propose STEVE-2, a hierarchical knowledge distillation framework for open-ended embodied tasks, characterized by 1) a hierarchical system for multi-granular task division, 2) a mirrored distillation method for parallel simulation data, and 3) an extra expert model for bringing additional knowledge into parallel simulation. After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM. Extensive evaluations on navigation and creation tasks highlight the superior performance of STEVE-2 in open-ended tasks, with $1.4 \times$ - $7.3 \times$ in performance.
Abstract:Near-field (NF) communications draw much attention in the context of extremely large-scale antenna arrays (ELAA). Owing to a large number of antennas and high carrier frequency, the NF coverage distance is quite substantial, where the electromagnetic radiation propagates by spherical waves, in contrast to the conventional planar waves of the far-field. Motivated by these facts, the block-dominant compressed sensing (BD-CS) assisted NF communications are proposed. Specifically, we elucidate why block sparsity exists in the distance-limited NF region. Then, block-dominant side-information (BD-SI) is introduced in support of the actual NF communication implementation. We validate that BD-CS is capable of providing exceptional channel estimation accuracy and high spectral efficiency, where the associated challenges, opportunities and its actual implementation in NF communications need to be carefully addressed.
Abstract:We propose a new method to improve the convergence speed of the Robbins-Monro algorithm by introducing prior information about the target point into the Robbins-Monro iteration. We achieve the incorporation of prior information without the need of a -- potentially wrong -- regression model, which would also entail additional constraints. We show that this prior-information Robbins-Monro sequence is convergent for a wide range of prior distributions, even wrong ones, such as Gaussian, weighted sum of Gaussians, e.g., in a kernel density estimate, as well as bounded arbitrary distribution functions greater than zero. We furthermore analyse the sequence numerically to understand its performance and the influence of parameters. The results demonstrate that the prior-information Robbins-Monro sequence converges faster than the standard one, especially during the first steps, which are particularly important for applications where the number of function measurements is limited, and when the noise of observing the underlying function is large. We finally propose a rule to select the parameters of the sequence.
Abstract:Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due to the following factors: (1) lack of comprehensive pathology datasets used in SAM training and (2) the design of SAM is not inherently optimized for semantic segmentation tasks. In this work, we adapt SAM for semantic segmentation by introducing trainable class prompts, followed by further enhancements through the incorporation of a pathology encoder, specifically a pathology foundation model. Our framework, SAM-Path enhances SAM's ability to conduct semantic segmentation in digital pathology without human input prompts. Through experiments on two public pathology datasets, the BCSS and the CRAG datasets, we demonstrate that the fine-tuning with trainable class prompts outperforms vanilla SAM with manual prompts and post-processing by 27.52% in Dice score and 71.63% in IOU. On these two datasets, the proposed additional pathology foundation model further achieves a relative improvement of 5.07% to 5.12% in Dice score and 4.50% to 8.48% in IOU.
Abstract:Fault diagnosis is essential in industrial processes for monitoring the conditions of important machines. With the ever-increasing complexity of working conditions and demand for safety during production and operation, different diagnosis methods are required, and more importantly, an integrated fault diagnosis system that can cope with multiple tasks is highly desired. However, the diagnosis subtasks are often studied separately, and the currently available methods still need improvement for such a generalized system. To address this issue, we propose the Generalized Out-of-distribution Fault Diagnosis (GOOFD) framework to integrate diagnosis subtasks, such as fault detection, fault classification, and novel fault diagnosis. Additionally, a unified fault diagnosis method based on internal contrastive learning is put forward to underpin the proposed generalized framework. The method extracts features utilizing the internal contrastive learning technique and then recognizes the outliers based on the Mahalanobis distance. Experiments are conducted on a simulated benchmark dataset as well as two practical process datasets to evaluate the proposed framework. As demonstrated in the experiments, the proposed method achieves better performance compared with several existing techniques and thus verifies the effectiveness of the proposed framework.