The generalized linear system (GLS) has been widely used in wireless communications to evaluate the effect of nonlinear preprocessing on receiver performance. Generalized approximation message passing (AMP) is a state-of-the-art algorithm for the signal recovery of GLS, but it was limited to measurement matrices with independent and identically distributed (IID) elements. To relax this restriction, generalized orthogonal/vector AMP (GOAMP/GVAMP) for unitarily-invariant measurement matrices was established, which has been proven to be replica Bayes optimal in uncoded GLS. However, the information-theoretic limit of GOAMP/GVAMP is still an open challenge for arbitrary input distributions due to its complex state evolution (SE). To address this issue, in this paper, we provide the achievable rate analysis of GOAMP/GVAMP in GLS, establishing its information-theoretic limit (i.e., maximum achievable rate). Specifically, we transform the fully-unfolded state evolution (SE) of GOAMP/GVAMP into an equivalent single-input single-output variational SE (VSE). Using the VSE and the mutual information and minimum mean-square error (I-MMSE) lemma, the achievable rate of GOAMP/GVAMP is derived. Moreover, the optimal coding principle for maximizing the achievable rate is proposed, based on which a kind of low-density parity-check (LDPC) code is designed. Numerical results verify the achievable rate advantages of GOAMP/GVAMP over the conventional maximum ratio combining (MRC) receiver based on the linearized model and the BER performance gains of the optimized LDPC codes (0.8~2.8 dB) compared to the existing methods.
In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. Built upon the sparse query design in the PETR series, we systematically develop an object-centric temporal mechanism. The model is performed in an online manner and the long-term historical information is propagated through object queries frame by frame. Besides, we introduce a motion-aware layer normalization to model the movement of the objects. StreamPETR achieves significant performance improvements only with negligible computation cost, compared to the single-frame baseline. On the standard nuScenes benchmark, it reaches a new state-of-the-art performance (63.6% NDS). The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8x faster FPS. Code will be available at https://github.com/exiawsh/StreamPETR.git.
The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that result from actions, bandit algorithms have seen a large and growing interest from industrial applications, such as search, recommendation and advertising. Indeed, with the bandit lens comes the promise of direct optimisation for the metrics we care about. Nevertheless, the road to successfully applying bandits in production is not an easy one. Even when the action space and rewards are well-defined, practitioners still need to make decisions regarding multi-arm or contextual approaches, on- or off-policy setups, delayed or immediate feedback, myopic or long-term optimisation, etc. To make matters worse, industrial platforms typically give rise to large action spaces in which existing approaches tend to break down. The research literature on these topics is broad and vast, but this can overwhelm practitioners, whose primary aim is to solve practical problems, and therefore need to decide on a specific instantiation or approach for each project. This tutorial will take a step towards filling that gap between the theory and practice of bandits. Our goal is to present a unified overview of the field and its existing terminology, concepts and algorithms -- with a focus on problems relevant to industry. We hope our industrial perspective will help future practitioners who wish to leverage the bandit paradigm for their application.
The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction, which requires complicated indexing of local image-view features via 3D-to-2D projection. Other methods implicitly introduce geometric positional encoding and perform global attention (e.g., PETR) to build the relationship between image tokens and 3D objects. The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence. We propose Focal-PETR with instance-guided supervision and spatial alignment module to adaptively focus object queries on discriminative foreground regions. Focal-PETR additionally introduces a down-sampling strategy to reduce the consumption of global attention. Due to the highly parallelized implementation and down-sampling strategy, our model, without depth supervision, achieves leading performance on the large-scale nuScenes benchmark and a superior speed of 30 FPS on a single RTX3090 GPU. Extensive experiments show that our method outperforms PETR while consuming 3x fewer training hours. The code will be made publicly available.
Intelligent mesh generation (IMG) refers to a technique to generate mesh by machine learning, which is a relatively new and promising research field. Within its short life span, IMG has greatly expanded the generalizability and practicality of mesh generation techniques and brought many breakthroughs and potential possibilities for mesh generation. However, there is a lack of surveys focusing on IMG methods covering recent works. In this paper, we are committed to a systematic and comprehensive survey describing the contemporary IMG landscape. Focusing on 110 preliminary IMG methods, we conducted an in-depth analysis and evaluation from multiple perspectives, including the core technique and application scope of the algorithm, agent learning goals, data types, targeting challenges, advantages and limitations. With the aim of literature collection and classification based on content extraction, we propose three different taxonomies from three views of key technique, output mesh unit element, and applicable input data types. Finally, we highlight some promising future research directions and challenges in IMG. To maximize the convenience of readers, a project page of IMG is provided at \url{https://github.com/xzb030/IMG_Survey}.
Tailor-made for massive connectivity and sporadic access, grant-free random access has become a promising candidate access protocol for massive machine-type communications (mMTC). Compared with conventional grant-based protocols, grant-free random access skips the exchange of scheduling information to reduce the signaling overhead, and facilitates sharing of access resources to enhance access efficiency. However, some challenges remain to be addressed in the receiver design, such as unknown identity of active users and multi-user interference (MUI) on shared access resources. In this work, we deal with the problem of joint user activity and data detection for grant-free random access. Specifically, the approximate message passing (AMP) algorithm is first employed to mitigate MUI and decouple the signals of different users. Then, we extend the data symbol alphabet to incorporate the null symbols from inactive users. In this way, the joint user activity and data detection problem is formulated as a clustering problem under the Gaussian mixture model. Furthermore, in conjunction with the AMP algorithm, a variational Bayesian inference based clustering (VBIC) algorithm is developed to solve this clustering problem. Simulation results show that, compared with state-of-art solutions, the proposed AMP-combined VBIC (AMP-VBIC) algorithm achieves a significant performance gain in detection accuracy.
Video understanding is an important problem in computer vision. Currently, the well-studied task in this research is human action recognition, where the clips are manually trimmed from the long videos, and a single class of human action is assumed for each clip. However, we may face more complicated scenarios in the industrial applications. For example, in the real-world urban pipe system, anomaly defects are fine-grained, multi-labeled, domain-relevant. To recognize them correctly, we need to understand the detailed video content. For this reason, we propose to advance research areas of video understanding, with a shift from traditional action recognition to industrial anomaly analysis. In particular, we introduce two high-quality video benchmarks, namely QV-Pipe and CCTV-Pipe, for anomaly inspection in the real-world urban pipe systems. Based on these new datasets, we will host two competitions including (1) Video Defect Classification on QV-Pipe and (2) Temporal Defect Localization on CCTV-Pipe. In this report, we describe the details of these benchmarks, the problem definitions of competition tracks, the evaluation metric, and the result summary. We expect that, this competition would bring new opportunities and challenges for video understanding in smart city and beyond. The details of our VideoPipe challenge can be found in https://videopipe.github.io.
Open-world instance segmentation (OWIS) aims to segment class-agnostic instances from images, which has a wide range of real-world applications such as autonomous driving. Most existing approaches follow a two-stage pipeline: performing class-agnostic detection first and then class-specific mask segmentation. In contrast, this paper proposes a single-stage framework to produce a mask for each instance directly. Also, instance mask annotations could be noisy in the existing datasets; to overcome this issue, we introduce a new regularization loss. Specifically, we first train an extra branch to perform an auxiliary task of predicting foreground regions (i.e. regions belonging to any object instance), and then encourage the prediction from the auxiliary branch to be consistent with the predictions of the instance masks. The key insight is that such a cross-task consistency loss could act as an error-correcting mechanism to combat the errors in annotations. Further, we discover that the proposed cross-task consistency loss can be applied to images without any annotation, lending itself to a semi-supervised learning method. Through extensive experiments, we demonstrate that the proposed method can achieve impressive results in both fully-supervised and semi-supervised settings. Compared to SOTA methods, the proposed method significantly improves the $AP_{100}$ score by 4.75\% in UVO$\rightarrow$UVO setting and 4.05\% in COCO$\rightarrow$UVO setting. In the case of semi-supervised learning, our model learned with only 30\% labeled data, even outperforms its fully-supervised counterpart with 50\% labeled data. The code will be released soon.
The quality of ontologies in terms of their correctness and completeness is crucial for developing high-quality ontology-based applications. Traditional debugging techniques repair ontologies by removing unwanted axioms, but may thereby remove consequences that are correct in the domain of the ontology. In this paper we propose an interactive approach to mitigate this for $\mathcal{EL}$ ontologies by axiom weakening and completing. We present algorithms for weakening and completing and present the first approach for repairing that takes into account removing, weakening and completing. We show different combination strategies, discuss the influence on the final ontologies and show experimental results. We show that previous work has only considered special cases and that there is a trade-off between the amount of validation work for a domain expert and the quality of the ontology in terms of correctness and completeness.
The downlink channel covariance matrix (CCM) acquisition is the key step for the practical performance of massive multiple-input and multiple-output (MIMO) systems, including beamforming, channel tracking, and user scheduling. However, this task is challenging in the popular frequency division duplex massive MIMO systems with Type I codebook due to the limited channel information feedback. In this paper, we propose a novel formulation that leverages the structure of the codebook and feedback values for an accurate estimation of the downlink CCM. Then, we design a cutting plane algorithm to consecutively shrink the feasible set containing the downlink CCM, enabled by the careful design of pilot weighting matrices. Theoretical analysis shows that as the number of communication rounds increases, the proposed cutting plane algorithm can recover the ground-truth CCM. Numerical results are presented to demonstrate the superior performance of the proposed algorithm over the existing benchmark in CCM reconstruction.