Abstract:This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose CPM.cu that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.
Abstract:Knowledge graph-based retrieval-augmented generation seeks to mitigate hallucinations in Large Language Models (LLMs) caused by insufficient or outdated knowledge. However, existing methods often fail to fully exploit the prior knowledge embedded in knowledge graphs (KGs), particularly their structural information and explicit or implicit constraints. The former can enhance the faithfulness of LLMs' reasoning, while the latter can improve the reliability of response generation. Motivated by these, we propose a trustworthy reasoning framework, termed Deliberation over Priors (DP), which sufficiently utilizes the priors contained in KGs. Specifically, DP adopts a progressive knowledge distillation strategy that integrates structural priors into LLMs through a combination of supervised fine-tuning and Kahneman-Tversky optimization, thereby improving the faithfulness of relation path generation. Furthermore, our framework employs a reasoning-introspection strategy, which guides LLMs to perform refined reasoning verification based on extracted constraint priors, ensuring the reliability of response generation. Extensive experiments on three benchmark datasets demonstrate that DP achieves new state-of-the-art performance, especially a Hit@1 improvement of 13% on the ComplexWebQuestions dataset, and generates highly trustworthy responses. We also conduct various analyses to verify its flexibility and practicality. The code is available at https://github.com/reml-group/Deliberation-on-Priors.
Abstract:With the rapid proliferation of large language models and vision-language models, AI agents have evolved from isolated, task-specific systems into autonomous, interactive entities capable of perceiving, reasoning, and acting without human intervention. As these agents proliferate across virtual and physical environments, from virtual assistants to embodied robots, the need for a unified, agent-centric infrastructure becomes paramount. In this survey, we introduce the Internet of Agents (IoA) as a foundational framework that enables seamless interconnection, dynamic discovery, and collaborative orchestration among heterogeneous agents at scale. We begin by presenting a general IoA architecture, highlighting its hierarchical organization, distinguishing features relative to the traditional Internet, and emerging applications. Next, we analyze the key operational enablers of IoA, including capability notification and discovery, adaptive communication protocols, dynamic task matching, consensus and conflict-resolution mechanisms, and incentive models. Finally, we identify open research directions toward building resilient and trustworthy IoA ecosystems.
Abstract:With the rise of large language and vision-language models, AI agents have evolved into autonomous, interactive systems capable of perception, reasoning, and decision-making. As they proliferate across virtual and physical domains, the Internet of Agents (IoA) has emerged as a key infrastructure for enabling scalable and secure coordination among heterogeneous agents. This survey offers a comprehensive examination of the security and privacy landscape in IoA systems. We begin by outlining the IoA architecture and its distinct vulnerabilities compared to traditional networks, focusing on four critical aspects: identity authentication threats, cross-agent trust issues, embodied security, and privacy risks. We then review existing and emerging defense mechanisms and highlight persistent challenges. Finally, we identify open research directions to advance the development of resilient and privacy-preserving IoA ecosystems.
Abstract:Text-to-image models based on diffusion processes, such as DALL-E, Stable Diffusion, and Midjourney, are capable of transforming texts into detailed images and have widespread applications in art and design. As such, amateur users can easily imitate professional-level paintings by collecting an artist's work and fine-tuning the model, leading to concerns about artworks' copyright infringement. To tackle these issues, previous studies either add visually imperceptible perturbation to the artwork to change its underlying styles (perturbation-based methods) or embed post-training detectable watermarks in the artwork (watermark-based methods). However, when the artwork or the model has been published online, i.e., modification to the original artwork or model retraining is not feasible, these strategies might not be viable. To this end, we propose a novel method for data-use auditing in the text-to-image generation model. The general idea of ArtistAuditor is to identify if a suspicious model has been finetuned using the artworks of specific artists by analyzing the features related to the style. Concretely, ArtistAuditor employs a style extractor to obtain the multi-granularity style representations and treats artworks as samplings of an artist's style. Then, ArtistAuditor queries a trained discriminator to gain the auditing decisions. The experimental results on six combinations of models and datasets show that ArtistAuditor can achieve high AUC values (> 0.937). By studying ArtistAuditor's transferability and core modules, we provide valuable insights into the practical implementation. Finally, we demonstrate the effectiveness of ArtistAuditor in real-world cases by an online platform Scenario. ArtistAuditor is open-sourced at https://github.com/Jozenn/ArtistAuditor.
Abstract:Audio-Visual Question Answering (AVQA) is a challenging multimodal reasoning task requiring intelligent systems to answer natural language queries based on paired audio-video inputs accurately. However, existing AVQA approaches often suffer from overfitting to dataset biases, leading to poor robustness. Moreover, current datasets may not effectively diagnose these methods. To address these challenges, we first introduce a novel dataset, FortisAVQA, constructed in two stages: (1) rephrasing questions in the test split of the public MUSIC-AVQA dataset and (2) introducing distribution shifts across questions. The first stage expands the test space with greater diversity, while the second enables a refined robustness evaluation across rare, frequent, and overall question distributions. Second, we introduce a robust Multimodal Audio-Visual Epistemic Network (MAVEN) that leverages a multifaceted cycle collaborative debiasing strategy to mitigate bias learning. Experimental results demonstrate that our architecture achieves state-of-the-art performance on FortisAVQA, with a notable improvement of 7.81\%. Extensive ablation studies on both datasets validate the effectiveness of our debiasing components. Additionally, our evaluation reveals the limited robustness of existing multimodal QA methods. We also verify the plug-and-play capability of our strategy by integrating it with various baseline models across both datasets. Our dataset and code are available at https://github.com/reml-group/fortisavqa.
Abstract:In wireless networks with integrated sensing and communications (ISAC), edge intelligence (EI) is expected to be developed at edge devices (ED) for sensing user activities based on channel state information (CSI). However, due to the CSI being highly specific to users' characteristics, the CSI-activity relationship is notoriously domain dependent, essentially demanding EI to learn sufficient datasets from various domains in order to gain cross-domain sensing capability. This poses a crucial challenge owing to the EDs' limited resources, for which storing datasets across all domains will be a significant burden. In this paper, we propose the EdgeCL framework, enabling the EI to continually learn-then-discard each incoming dataset, while remaining resilient to catastrophic forgetting. We design a transformer-based discriminator for handling sequences of noisy and nonequispaced CSI samples. Besides, we propose a distilled core-set based knowledge retention method with robustness-enhanced optimization to train the discriminator, preserving its performance for previous domains while preventing future forgetting. Experimental evaluations show that EdgeCL achieves 89% of performance compared to cumulative training while consuming only 3% of its memory, mitigating forgetting by 79%.
Abstract:The movable antenna (MA)-enabled integrated sensing and communication (ISAC) system attracts widespread attention as an innovative framework. The ISAC system integrates sensing and communication functions, achieving resource sharing across various domains, significantly enhancing communication and sensing performance, and promoting the intelligent interconnection of everything. Meanwhile, MA utilizes the spatial variations of wireless channels by dynamically adjusting the positions of MA elements at the transmitter and receiver to improve the channel and further enhance the performance of the ISAC systems. In this paper, we first outline the fundamental principles of MA and introduce the application scenarios of MA-enabled ISAC systems. Then, we summarize the advantages of MA-enabled ISAC systems in enhancing spectral efficiency, achieving flexible and precise beamforming, and making the signal coverage range adjustable. Besides, a specific case is studied to show the performance gains in terms of transmit power that MA brings to ISAC systems. Finally, we discuss the challenges of MA-enabled ISAC and future research directions, aiming to provide insights for future research on MA-enabled ISAC systems.
Abstract:In this paper, we investigate a secure communication architecture based on unmanned aerial vehicle (UAV), which enhances the security performance of the communication system through UAV trajectory optimization. We formulate a control problem of minimizing the UAV flight path and power consumption while maximizing secure communication rate over infinite horizon by jointly optimizing UAV trajectory, transmit beamforming vector, and artificial noise (AN) vector. Given the non-uniqueness of optimization objective and significant coupling of the optimization variables, the problem is a non-convex optimization problem which is difficult to solve directly. To address this complex issue, an alternating-iteration technique is employed to decouple the optimization variables. Specifically, the problem is divided into three subproblems, i.e., UAV trajectory, transmit beamforming vector, and AN vector, which are solved alternately. Additionally, considering the susceptibility of UAV trajectory to disturbances, the model predictive control (MPC) approach is applied to obtain UAV trajectory and enhance the system robustness. Numerical results demonstrate the superiority of the proposed optimization algorithm in maintaining accurate UAV trajectory and high secure communication rate compared with other benchmark schemes.
Abstract:In this paper, we propose a full-duplex integrated sensing and communication (ISAC) system enabled by a movable antenna (MA). By leveraging the characteristic of MA that can increase the spatial diversity gain, the performance of the system can be enhanced. We formulate a problem of minimizing the total transmit power consumption via jointly optimizing the discrete position of MA elements, beamforming vectors, sensing signal covariance matrix and user transmit power. Given the significant coupling of optimization variables, the formulated problem presents a non-convex optimization challenge that poses difficulties for direct resolution. To address this challenging issue, the discrete binary particle swarm optimization (BPSO) algorithm framework is employed to solve the formulated problem. Specifically, the discrete positions of MA elements are first obtained by iteratively solving the fitness function. The difference-of-convex (DC) programming and successive convex approximation (SCA) are used to handle non-convex and rank-1 terms in the fitness function. Once the BPSO iteration is complete, the discrete positions of MA elements can be determined, and we can obtain the solutions for beamforming vectors, sensing signal covariance matrix and user transmit power. Numerical results demonstrate the superiority of the proposed system in reducing the total transmit power consumption compared with fixed antenna arrays.