School of Software, Tianjin University




Abstract:The recent trend of large language models (LLMs) is to increase the scale of both model size (\aka the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of constructing a strong model architecture for LLMs is rarely discussed. We first analyze the state-of-the-art language model architectures and observe the feature collapse problem. Based on the theoretical analysis, we propose that the nonlinearity is also very important for language models, which is usually studied in convolutional neural networks for vision tasks. The series informed activation function is then introduced with tiny calculations that can be ignored, and an augmented shortcut is further used to enhance the model nonlinearity. We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$\pi$. Experiments are then conducted using the same dataset and training strategy to compare PanGu-$\pi$ with state-of-the-art LLMs. The results show that PanGu-$\pi$-7B can achieve a comparable performance to that of benchmarks with about 10\% inference speed-up, and PanGu-$\pi$-1B can achieve state-of-the-art performance in terms of accuracy and efficiency. In addition, we have deployed PanGu-$\pi$-7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application. The results show that YunShan can surpass other models with similar scales on benchmarks.




Abstract:Perching on the moving platforms is a promising solution to enhance the endurance and operational range of quadrotors, which could benefit the efficiency of a variety of air-ground cooperative tasks. To ensure robust perching, tracking with a steady relative state and reliable perception is a prerequisite. This paper presents an adaptive dynamic tracking and perching scheme for autonomous quadrotors to achieve tight integration with moving platforms. For reliable perception of dynamic targets, we introduce elastic visibility-aware planning to actively avoid occlusion and target loss. Additionally, we propose a flexible terminal adjustment method that adapts the changes in flight duration and the coupled terminal states, ensuring full-state synchronization with the time-varying perching surface at various angles. A relaxation strategy is developed by optimizing the tangential relative speed to address the dynamics and safety violations brought by hard boundary conditions. Moreover, we take SE(3) motion planning into account to ensure no collision between the quadrotor and the platform until the contact moment. Furthermore, we propose an efficient spatiotemporal trajectory optimization framework considering full state dynamics for tracking and perching. The proposed method is extensively tested through benchmark comparisons and ablation studies. To facilitate the application of academic research to industry and to validate the efficiency of our scheme under strictly limited computational resources, we deploy our system on a commercial drone (DJI-MAVIC3) with a full-size sport-utility vehicle (SUV). We conduct extensive real-world experiments, where the drone successfully tracks and perches at 30~km/h (8.3~m/s) on the top of the SUV, and at 3.5~m/s with 60{\deg} inclined into the trunk of the SUV.
Abstract:Convolutional Neural Networks (CNNs) are hard to deploy on edge devices due to its high computation and storage complexities. As a common practice for model compression, network pruning consists of two major categories: unstructured and structured pruning, where unstructured pruning constantly performs better. However, unstructured pruning presents a structured pattern at high pruning rates, which limits its performance. To this end, we propose a Rank-based PruninG (RPG) method to maintain the ranks of sparse weights in an adversarial manner. In each step, we minimize the low-rank approximation error for the weight matrices using singular value decomposition, and maximize their distance by pushing the weight matrices away from its low rank approximation. This rank-based optimization objective guides sparse weights towards a high-rank topology. The proposed method is conducted in a gradual pruning fashion to stabilize the change of rank during training. Experimental results on various datasets and different tasks demonstrate the effectiveness of our algorithm in high sparsity. The proposed RPG outperforms the state-of-the-art performance by 1.13% top-1 accuracy on ImageNet in ResNet-50 with 98% sparsity. The codes are available at https://github.com/huawei-noah/Efficient-Computing/tree/master/Pruning/RPG and https://gitee.com/mindspore/models/tree/master/research/cv/RPG.
Abstract:Next-generation wireless networks are expected to utilize the limited radio frequency (RF) resources more efficiently with the aid of intelligent transceivers. To this end, we propose a promising transceiver architecture relying on stacked intelligent metasurfaces (SIM). An SIM is constructed by stacking an array of programmable metasurface layers, where each layer consists of a massive number of low-cost passive meta-atoms that individually manipulate the electromagnetic (EM) waves. By appropriately configuring the passive meta-atoms, an SIM is capable of accomplishing advanced computation and signal processing tasks, such as multiple-input multiple-output (MIMO) precoding/combining, multi-user interference mitigation, and radar sensing, as the EM wave propagates through the multiple layers of the metasurface, which effectively reduces both the RF-related energy consumption and processing delay. Inspired by this, we provide an overview of the SIM-aided MIMO transceiver design, which encompasses its hardware architecture and its potential benefits over state-of-the-art solutions. Furthermore, we discuss promising application scenarios and identify the open research challenges associated with the design of advanced SIM architectures for next-generation wireless networks. Finally, numerical results are provided for quantifying the benefits of wave-based signal processing in wireless systems.




Abstract:Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an innovative method that transforms a single image into a detailed 3D textured mesh in approximately one minute. Our approach aims to fully harness the extensive knowledge embedded in 2D diffusion models and priors from valuable yet limited 3D data. This is achieved by initially finetuning a 2D diffusion model for consistent multi-view image generation, followed by elevating these images to 3D with the aid of multi-view conditioned 3D native diffusion models. Extensive experimental evaluations demonstrate that our method can produce high-quality, diverse 3D assets that closely mirror the original input image. Our project webpage: https://sudo-ai-3d.github.io/One2345plus_page.




Abstract:Multi-robot teams have attracted attention from industry and academia for their ability to perform collaborative tasks in unstructured environments, such as wilderness rescue and collaborative transportation.In this paper, we propose a trajectory planning method for a non-holonomic robotic team with collaboration in unstructured environments.For the adaptive state collaboration of a robot team to catch and transport targets to be rescued using a net, we model the process of catching the falling target with a net in a continuous and differentiable form.This enables the robot team to fully exploit the kinematic potential, thereby adaptively catching the target in an appropriate state.Furthermore, the size safety and topological safety of the net, resulting from the collaborative support of the robots, are guaranteed through geometric constraints.We integrate our algorithm on a car-like robot team and test it in simulations and real-world experiments to validate our performance.Our method is compared to state-of-the-art multi-vehicle trajectory planning methods, demonstrating significant performance in efficiency and trajectory quality.
Abstract:Multi-Stage Classifier (MSC) - several classifiers working sequentially in an arranged order and classification decision is partially made at each step - is widely used in industrial applications for various resource limitation reasons. The classifiers of a multi-stage process are usually Neural Network (NN) models trained independently or in their inference order without considering the signals from the latter stages. Aimed at two-stage binary classification process, the most common type of MSC, we propose a novel training framework, named Feedback Training. The classifiers are trained in an order reverse to their actual working order, and the classifier at the later stage is used to guide the training of initial-stage classifier via a sample weighting method. We experimentally show the efficacy of our proposed approach, and its great superiority under the scenario of few-shot training.




Abstract:Mutual localization stands as a foundational component within various domains of multi-robot systems. Nevertheless, in relative pose estimation, time synchronization is usually underappreciated and rarely addressed, although it significantly influences estimation accuracy. In this paper, we introduce time synchronization into mutual localization to recover the time offset and relative poses between robots simultaneously. Under a constant velocity assumption in a short time, we fuse time offset estimation with our previous bearing-based mutual localization by a novel error representation. Based on the error model, we formulate a joint optimization problem and utilize semi-definite relaxation (SDR) to furnish a lossless relaxation. By solving the relaxed problem, time synchronization and relative pose estimation can be achieved when time drift between robots is limited. To enhance the application range of time offset estimation, we further propose an iterative method to recover the time offset from coarse to fine. Comparisons between the proposed method and the existing ones through extensive simulation tests present prominent benefits of time synchronization on mutual localization. Moreover, real-world experiments are conducted to show the practicality and robustness.
Abstract:Many real-time applications of the Internet of Things (IoT) need to deal with correlated information generated by multiple sensors. The design of efficient status update strategies that minimize the Age of Correlated Information (AoCI) is a key factor. In this paper, we consider an IoT network consisting of sensors equipped with the energy harvesting (EH) capability. We optimize the average AoCI at the data fusion center (DFC) by appropriately managing the energy harvested by sensors, whose true battery states are unobservable during the decision-making process. Particularly, we first formulate the dynamic status update procedure as a partially observable Markov decision process (POMDP), where the environmental dynamics are unknown to the DFC. In order to address the challenges arising from the causality of energy usage, unknown environmental dynamics, unobservability of sensors'true battery states, and large-scale discrete action space, we devise a deep reinforcement learning (DRL)-based dynamic status update algorithm. The algorithm leverages the advantages of the soft actor-critic and long short-term memory techniques. Meanwhile, it incorporates our proposed action decomposition and mapping mechanism. Extensive simulations are conducted to validate the effectiveness of our proposed algorithm by comparing it with available DRL algorithms for POMDPs.




Abstract:Learning with noisy labels (LNL) has been extensively studied, with existing approaches typically following a framework that alternates between clean sample selection and semi-supervised learning (SSL). However, this approach has a limitation: the clean set selected by the Deep Neural Network (DNN) classifier, trained through self-training, inevitably contains noisy samples. This mixture of clean and noisy samples leads to misguidance in DNN training during SSL, resulting in impaired generalization performance due to confirmation bias caused by error accumulation in sample selection. To address this issue, we propose a method called Collaborative Sample Selection (CSS), which leverages the large-scale pre-trained model CLIP. CSS aims to remove the mixed noisy samples from the identified clean set. We achieve this by training a 2-Dimensional Gaussian Mixture Model (2D-GMM) that combines the probabilities from CLIP with the predictions from the DNN classifier. To further enhance the adaptation of CLIP to LNL, we introduce a co-training mechanism with a contrastive loss in semi-supervised learning. This allows us to jointly train the prompt of CLIP and the DNN classifier, resulting in improved feature representation, boosted classification performance of DNNs, and reciprocal benefits to our Collaborative Sample Selection. By incorporating auxiliary information from CLIP and utilizing prompt fine-tuning, we effectively eliminate noisy samples from the clean set and mitigate confirmation bias during training. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed method in comparison with the state-of-the-art approaches.