It has been shown that the task of learning the structure of Bayesian networks (BN) from observational data is an NP-Hard problem. Although there have been attempts made to tackle this problem, these solutions assume direct access to the observational data which may not be practical in certain applications. In this paper, we explore the feasibility of recovering the structure of Gaussian Bayesian Network (GBN) from compressed (low dimensional and indirect) measurements. We propose a novel density-evolution based framework for optimizing compressed linear measurement systems that would, by design, allow for more accurate retrieval of the covariance matrix and thereby the graph structure. In particular, under the assumption that both the covariance matrix and the graph are sparse, we show that the structure of GBN can indeed be recovered from resulting compressed measurements. The numerical simulations show that our sensing systems outperform the state of the art with respect to Maximum absolute error (MAE) and have comparable performance with respect to precision and recall, without any need for ad-hoc parameter tuning.
Few researches have studied simultaneous detection of smoke and flame accompanying fires due to their different physical natures that lead to uncertain fluid patterns. In this study, we collect a large image data set to re-label them as a multi-label image classification problem so as to identify smoke and flame simultaneously. In order to solve the generalization ability of the detection model on account of the movable fluid objects with uncertain shapes like fire and smoke, and their not compactible natures as well as the complex backgrounds with high variations, we propose a data augment method by random image stitch to deploy resizing, deforming, position variation, and background altering so as to enlarge the view of the learner. Moreover, we propose a self-learning data augment method by using the class activation map to extract the highly trustable region as new data source of positive examples to further enhance the data augment. By the mutual reinforcement between the data augment and the detection model that are performed iteratively, both modules make progress in an evolutionary manner. Experiments show that the proposed method can effectively improve the generalization performance of the model for concurrent smoke and fire detection.
In this paper, we propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations, specifically for the code search task. For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name. For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs. Both contrastive objectives can fully leverage the large-scale code corpus for pre-training. Experimental results on several public benchmarks, (i.e., CodeSearch, CoSQA, etc.) demonstrate the effectiveness of CodeRetriever in the zero-shot setting. By fine-tuning with domain/language specified downstream data, CodeRetriever achieves the new state-of-the-art performance with significant improvement over existing code pre-trained models. We will make the code, model checkpoint, and constructed datasets publicly available.
Neural Architecture Search (NAS) has been widely adopted to design accurate and efficient image classification models. However, applying NAS to a new computer vision task still requires a huge amount of effort. This is because 1) previous NAS research has been over-prioritized on image classification while largely ignoring other tasks; 2) many NAS works focus on optimizing task-specific components that cannot be favorably transferred to other tasks; and 3) existing NAS methods are typically designed to be "proxyless" and require significant effort to be integrated with each new task's training pipelines. To tackle these challenges, we propose FBNetV5, a NAS framework that can search for neural architectures for a variety of vision tasks with much reduced computational cost and human effort. Specifically, we design 1) a search space that is simple yet inclusive and transferable; 2) a multitask search process that is disentangled with target tasks' training pipeline; and 3) an algorithm to simultaneously search for architectures for multiple tasks with a computational cost agnostic to the number of tasks. We evaluate the proposed FBNetV5 targeting three fundamental vision tasks -- image classification, object detection, and semantic segmentation. Models searched by FBNetV5 in a single run of search have outperformed the previous stateof-the-art in all the three tasks: image classification (e.g., +1.3% ImageNet top-1 accuracy under the same FLOPs as compared to FBNetV3), semantic segmentation (e.g., +1.8% higher ADE20K val. mIoU than SegFormer with 3.6x fewer FLOPs), and object detection (e.g., +1.1% COCO val. mAP with 1.2x fewer FLOPs as compared to YOLOX).
Current dense text retrieval models face two typical challenges. First, it adopts a siamese dual-encoder architecture to encode query and document independently for fast indexing and searching, whereas neglecting the finer-grained term-wise interactions. This results in a sub-optimal recall performance. Second, it highly relies on a negative sampling technique to build up the negative documents in its contrastive loss. To address these challenges, we present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker. The two models are jointly optimized according to a minimax adversarial objective: the retriever learns to retrieve negative documents to cheat the ranker, while the ranker learns to rank a collection of candidates including both the ground-truth and the retrieved ones, as well as providing progressive direct feedback to the dual-encoder retriever. Through this adversarial game, the retriever gradually produces harder negative documents to train a better ranker, whereas the cross-encoder ranker provides progressive feedback to improve retriever. We evaluate AR2 on three benchmarks. Experimental results show that AR2 consistently and significantly outperforms existing dense retriever methods and achieves new state-of-the-art results on all of them. This includes the improvements on Natural Questions R@5 to 77.9%(+2.1%), TriviaQA R@5 to 78.2%(+1.4), and MS-MARCO MRR@10 to 39.5%(+1.3%). We will make our code, models, and data publicly available.
In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.
An approach to reduce motion artifacts in Quantitative Susceptibility Mapping using deep learning is proposed. We use an affine motion model with randomly created motion profiles to simulate motion-corrupted QSM images. The simulated QSM image is paired with its motion-free reference to train a neural network using supervised learning. The trained network is tested on unseen simulated motion-corrupted QSM images, in healthy volunteers and in Parkinson's disease patients. The results show that motion artifacts, such as ringing and ghosting, were successfully suppressed.
Quantitative imaging in MRI usually involves acquisition and reconstruction of a series of images at multi-echo time points, which possibly requires more scan time and specific reconstruction technique compared to conventional qualitative imaging. In this work, we focus on optimizing the acquisition and reconstruction process of multi-echo gradient echo pulse sequence for quantitative susceptibility mapping as one important quantitative imaging method in MRI. A multi-echo sampling pattern optimization block extended from LOUPE-ST is proposed to optimize the k-space sampling patterns along echoes. Besides, a recurrent temporal feature fusion block is proposed and inserted into a backbone deep ADMM network to capture the signal evolution along echo time during reconstruction. Experiments show that both blocks help improve multi-echo image reconstruction performance.