Abstract:Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardware resources, optimizing inference performance is necessary. In this paper, we introduce an easily deployable inference performance optimization solution aimed at accelerating LLMs on CPUs. In this solution, we implement an effective way to reduce the KV cache size while ensuring precision. We propose a distributed inference optimization approach and implement it based on oneAPI Collective Communications Library. Furthermore, we propose optimization approaches for LLMs on CPU, and conduct tailored optimizations for the most commonly used models. The code is open-sourced at https://github.com/intel/xFasterTransformer.
Abstract:Artificial Intelligence of Things (AIoT) is an emerging frontier based on the deep fusion of Internet of Things (IoT) and Artificial Intelligence (AI) technologies. Although advanced deep learning techniques enhance the efficient data processing and intelligent analysis of complex IoT data, they still suffer from notable challenges when deployed to practical AIoT applications, such as constrained resources, and diverse task requirements. Knowledge transfer is an effective method to enhance learning performance by avoiding the exorbitant costs associated with data recollection and model retraining. Notably, although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances of various knowledge transfer techniques for AIoT field. This survey endeavors to introduce a new concept of knowledge transfer, referred to as Crowd Knowledge Transfer (CrowdTransfer), which aims to transfer prior knowledge learned from a crowd of agents to reduce the training cost and as well as improve the performance of the model in real-world complicated scenarios. Particularly, we present four transfer modes from the perspective of crowd intelligence, including derivation, sharing, evolution and fusion modes. Building upon conventional transfer learning methods, we further delve into advanced crowd knowledge transfer models from three perspectives for various AIoT applications. Furthermore, we explore some applications of AIoT areas, such as human activity recognition, urban computing, multi-robot system, and smart factory. Finally, we discuss the open issues and outline future research directions of knowledge transfer in AIoT community.
Abstract:Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In this paper, we introduce Hawk, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, Hawk explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to improve the interpretation of motion-to-language, we establish a clear supervisory relationship between motion and its linguistic representation. Furthermore, we have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions. The final results demonstrate that Hawk achieves SOTA performance, surpassing existing baselines in both video description generation and question-answering. Our codes/dataset/demo will be released at https://github.com/jqtangust/hawk.
Abstract:There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization.
Abstract:Cross-Domain Recommendation (CDR) seeks to enable effective knowledge transfer across domains. Existing works rely on either representation alignment or transformation bridges, but they struggle on identifying domain-shared from domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its joint identifiability as they primarily fixate on the marginal distribution within a particular domain. Such a failure may overlook the conditionality between two domains and how it contributes to latent factor disentanglement, leading to negative transfer when domains are weakly correlated. In this study, we explore what should and should not be transferred in cross-domain user representations from a causality perspective. We propose a Hierarchical subspace disentanglement approach to explore the Joint IDentifiability of cross-domain joint distribution, termed HJID, to preserve domain-specific behaviors from domain-shared factors. HJID organizes user representations into layers: generic shallow subspaces and domain-oriented deep subspaces. We first encode the generic pattern in the shallow subspace by minimizing the Maximum Mean Discrepancy of initial layer activation. Then, to dissect how domain-oriented latent factors are encoded in deeper layers activation, we construct a cross-domain causality-based data generation graph, which identifies cross-domain consistent and domain-specific components, adhering to the Minimal Change principle. This allows HJID to maintain stability whilst discovering unique factors for different domains, all within a generative framework of invertible transformations that guarantee the joint identifiability. With experiments on real-world datasets, we show that HJID outperforms SOTA methods on a range of strongly and weakly correlated CDR tasks.
Abstract:Persuasion, as one of the crucial abilities in human communication, has garnered extensive attention from researchers within the field of intelligent dialogue systems. We humans tend to persuade others to change their viewpoints, attitudes or behaviors through conversations in various scenarios (e.g., persuasion for social good, arguing in online platforms). Developing dialogue agents that can persuade others to accept certain standpoints is essential to achieving truly intelligent and anthropomorphic dialogue system. Benefiting from the substantial progress of Large Language Models (LLMs), dialogue agents have acquired an exceptional capability in context understanding and response generation. However, as a typical and complicated cognitive psychological system, persuasive dialogue agents also require knowledge from the domain of cognitive psychology to attain a level of human-like persuasion. Consequently, the cognitive strategy-enhanced persuasive dialogue agent (defined as CogAgent), which incorporates cognitive strategies to achieve persuasive targets through conversation, has become a predominant research paradigm. To depict the research trends of CogAgent, in this paper, we first present several fundamental cognitive psychology theories and give the formalized definition of three typical cognitive strategies, including the persuasion strategy, the topic path planning strategy, and the argument structure prediction strategy. Then we propose a new system architecture by incorporating the formalized definition to lay the foundation of CogAgent. Representative works are detailed and investigated according to the combined cognitive strategy, followed by the summary of authoritative benchmarks and evaluation metrics. Finally, we summarize our insights on open issues and future directions of CogAgent for upcoming researchers.
Abstract:This paper considers optimal traffic signal control in smart cities, which has been taken as a complex networked system control problem. Given the interacting dynamics among traffic lights and road networks, attaining controller adaptivity and scalability stands out as a primary challenge. Capturing the spatial-temporal correlation among traffic lights under the framework of Multi-Agent Reinforcement Learning (MARL) is a promising solution. Nevertheless, existing MARL algorithms ignore effective information aggregation which is fundamental for improving the learning capacity of decentralized agents. In this paper, we design a new decentralized control architecture with improved environmental observability to capture the spatial-temporal correlation. Specifically, we first develop a topology-aware information aggregation strategy to extract correlation-related information from unstructured data gathered in the road network. Particularly, we transfer the road network topology into a graph shift operator by forming a diffusion process on the topology, which subsequently facilitates the construction of graph signals. A diffusion convolution module is developed, forming a new MARL algorithm, which endows agents with the capabilities of graph learning. Extensive experiments based on both synthetic and real-world datasets verify that our proposal outperforms existing decentralized algorithms.
Abstract:Vehicular crowd intelligence (VCI) is an emerging research field. Facilitated by state-of-the-art vehicular ad-hoc networks and artificial intelligence, various VCI applications come to place, e.g., collaborative sensing, positioning, and mapping. The collaborative property of VCI applications generally requires data to be shared among participants, thus forming network-wide intelligence. How to fulfill this process without compromising data privacy remains a challenging issue. Although federated learning (FL) is a promising tool to solve the problem, adapting conventional FL frameworks to VCI is nontrivial. First, the centralized model aggregation is unreliable in VCI because of the existence of stragglers with unfavorable channel conditions. Second, existing FL schemes are vulnerable to Non-IID data, which is intensified by the data heterogeneity in VCI. This paper proposes a novel federated learning framework called RaftFed to facilitate privacy-preserving VCI. The experimental results show that RaftFed performs better than baselines regarding communication overhead, model accuracy, and model convergence.
Abstract:The emerging field of artificial intelligence of things (AIoT, AI+IoT) is driven by the widespread use of intelligent infrastructures and the impressive success of deep learning (DL). With the deployment of DL on various intelligent infrastructures featuring rich sensors and weak DL computing capabilities, a diverse range of AIoT applications has become possible. However, DL models are notoriously resource-intensive. Existing research strives to realize near-/realtime inference of AIoT live data and low-cost training using AIoT datasets on resource-scare infrastructures. Accordingly, the accuracy and responsiveness of DL models are bounded by resource availability. To this end, the algorithm-system co-design that jointly optimizes the resource-friendly DL models and model-adaptive system scheduling improves the runtime resource availability and thus pushes the performance boundary set by the standalone level. Unlike previous surveys on resource-friendly DL models or hand-crafted DL compilers/frameworks with partially fine-tuned components, this survey aims to provide a broader optimization space for more free resource-performance tradeoffs. The cross-level optimization landscape involves various granularity, including the DL model, computation graph, operator, memory schedule, and hardware instructor in both on-device and distributed paradigms. Furthermore, due to the dynamic nature of AIoT context, which includes heterogeneous hardware, agnostic sensing data, varying user-specified performance demands, and resource constraints, this survey explores the context-aware inter-/intra-device controllers for automatic cross-level adaptation. Additionally, we identify some potential directions for resource-efficient AIoT systems. By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
Abstract:Learning accurate cross-domain preference mappings in the absence of overlapped users/items has presented a persistent challenge in Non-overlapping Cross-domain Recommendation (NOCDR). Despite the efforts made in previous studies to address NOCDR, several limitations still exist. Specifically, 1) while some approaches substitute overlapping users/items with overlapping behaviors, they cannot handle NOCDR scenarios where such auxiliary information is unavailable; 2) often, cross-domain preference mapping is modeled by learning deterministic explicit representation matchings between sampled users in two domains. However, this can be biased due to individual preferences and thus fails to incorporate preference continuity and universality of the general population. In light of this, we assume that despite the scattered nature of user behaviors, there exists a consistent latent preference distribution shared among common people. Modeling such distributions further allows us to capture the continuity in user behaviors within each domain and discover preference invariance across domains. To this end, we propose a Distributional domain-invariant Preference Matching method for non-overlapping Cross-Domain Recommendation (DPMCDR). For each domain, we hierarchically approximate a posterior of domain-level preference distribution with empirical evidence derived from user-item interactions. Next, we aim to build distributional implicit matchings between the domain-level preferences of two domains. This process involves mapping them to a shared latent space and seeking a consensus on domain-invariant preference by minimizing the distance between their distributional representations therein. In this way, we can identify the alignment of two non-overlapping domains if they exhibit similar patterns of domain-invariant preference.