Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems. In this paper, we propose the first unified interpretation for existing zero-shot methods from the perspective of approximating the conditional posterior mean for the reverse diffusion process of conditional sampling. We reveal that recent methods are equivalent to making isotropic Gaussian approximations to intractable posterior distributions over clean images given diffused noisy images, with the only difference in the handcrafted design of isotropic posterior covariances. Inspired by this finding, we propose a general plug-and-play posterior covariance optimization based on maximum likelihood estimation to improve recent methods. To achieve optimal posterior covariance without retraining, we provide general solutions based on two approaches specifically designed to leverage pre-trained models with and without reverse covariances. Experimental results demonstrate that the proposed methods significantly enhance the overall performance or robustness to hyperparameters of recent methods. Code is available at https://github.com/xypeng9903/k-diffusion-inverse-problems
In diffusion models, deviations from a straight generative flow are a common issue, resulting in semantic inconsistencies and suboptimal generations. To address this challenge, we introduce `Non-Cross Diffusion', an innovative approach in generative modeling for learning ordinary differential equation (ODE) models. Our methodology strategically incorporates an ascending dimension of input to effectively connect points sampled from two distributions with uncrossed paths. This design is pivotal in ensuring enhanced semantic consistency throughout the inference process, which is especially critical for applications reliant on consistent generative flows, including various distillation methods and deterministic sampling, which are fundamental in image editing and interpolation tasks. Our empirical results demonstrate the effectiveness of Non-Cross Diffusion, showing a substantial reduction in semantic inconsistencies at different inference steps and a notable enhancement in the overall performance of diffusion models.
Echocardiogram video segmentation plays an important role in cardiac disease diagnosis. This paper studies the unsupervised domain adaption (UDA) for echocardiogram video segmentation, where the goal is to generalize the model trained on the source domain to other unlabelled target domains. Existing UDA segmentation methods are not suitable for this task because they do not model local information and the cyclical consistency of heartbeat. In this paper, we introduce a newly collected CardiacUDA dataset and a novel GraphEcho method for cardiac structure segmentation. Our GraphEcho comprises two innovative modules, the Spatial-wise Cross-domain Graph Matching (SCGM) and the Temporal Cycle Consistency (TCC) module, which utilize prior knowledge of echocardiogram videos, i.e., consistent cardiac structure across patients and centers and the heartbeat cyclical consistency, respectively. These two modules can better align global and local features from source and target domains, improving UDA segmentation results. Experimental results showed that our GraphEcho outperforms existing state-of-the-art UDA segmentation methods. Our collected dataset and code will be publicly released upon acceptance. This work will lay a new and solid cornerstone for cardiac structure segmentation from echocardiogram videos. Code and dataset are available at: https://github.com/xmed-lab/GraphEcho
Cardiac structure segmentation from echocardiogram videos plays a crucial role in diagnosing heart disease. The combination of multi-view echocardiogram data is essential to enhance the accuracy and robustness of automated methods. However, due to the visual disparity of the data, deriving cross-view context information remains a challenging task, and unsophisticated fusion strategies can even lower performance. In this study, we propose a novel Gobal-Local fusion (GL-Fusion) network to jointly utilize multi-view information globally and locally that improve the accuracy of echocardiogram analysis. Specifically, a Multi-view Global-based Fusion Module (MGFM) is proposed to extract global context information and to explore the cyclic relationship of different heartbeat cycles in an echocardiogram video. Additionally, a Multi-view Local-based Fusion Module (MLFM) is designed to extract correlations of cardiac structures from different views. Furthermore, we collect a multi-view echocardiogram video dataset (MvEVD) to evaluate our method. Our method achieves an 82.29% average dice score, which demonstrates a 7.83% improvement over the baseline method, and outperforms other existing state-of-the-art methods. To our knowledge, this is the first exploration of a multi-view method for echocardiogram video segmentation. Code available at: https://github.com/xmed-lab/GL-Fusion
Photonic neural network (PNN) is a remarkable analog artificial intelligence (AI) accelerator that computes with photons instead of electrons to feature low latency, high energy efficiency, and high parallelism. However, the existing training approaches cannot address the extensive accumulation of systematic errors in large-scale PNNs, resulting in a significant decrease in model performance in physical systems. Here, we propose dual adaptive training (DAT) that allows the PNN model to adapt to substantial systematic errors and preserves its performance during the deployment. By introducing the systematic error prediction networks with task-similarity joint optimization, DAT achieves the high similarity mapping between the PNN numerical models and physical systems and high-accurate gradient calculations during the dual backpropagation training. We validated the effectiveness of DAT by using diffractive PNNs and interference-based PNNs on image classification tasks. DAT successfully trained large-scale PNNs under major systematic errors and preserved the model classification accuracies comparable to error-free systems. The results further demonstrated its superior performance over the state-of-the-art in situ training approaches. DAT provides critical support for constructing large-scale PNNs to achieve advanced architectures and can be generalized to other types of AI systems with analog computing errors.
It is promising to solve linear inverse problems by unfolding iterative algorithms (e.g., iterative shrinkage thresholding algorithm (ISTA)) as deep neural networks (DNNs) with learnable parameters. However, existing ISTA-based unfolded algorithms restrict the network architectures for iterative updates with the partial weight coupling structure to guarantee convergence. In this paper, we propose hybrid ISTA to unfold ISTA with both pre-computed and learned parameters by incorporating free-form DNNs (i.e., DNNs with arbitrary feasible and reasonable network architectures), while ensuring theoretical convergence. We first develop HCISTA to improve the efficiency and flexibility of classical ISTA (with pre-computed parameters) without compromising the convergence rate in theory. Furthermore, the DNN-based hybrid algorithm is generalized to popular variants of learned ISTA, dubbed HLISTA, to enable a free architecture of learned parameters with a guarantee of linear convergence. To our best knowledge, this paper is the first to provide a convergence-provable framework that enables free-form DNNs in ISTA-based unfolded algorithms. This framework is general to endow arbitrary DNNs for solving linear inverse problems with convergence guarantees. Extensive experiments demonstrate that hybrid ISTA can reduce the reconstruction error with an improved convergence rate in the tasks of sparse recovery and compressive sensing.
* 109 pages, 16 figures; this is a draft and the final version has been
accepted by TPAMI (DOI: 10.1109/TPAMI.2022.3172214)
Photonic neural networks perform brain-inspired computations using photons instead of electrons that can achieve substantially improved computing performance. However, existing architectures can only handle data with regular structures, e.g., images or videos, but fail to generalize to graph-structured data beyond Euclidean space, e.g., social networks or document co-citation networks. Here, we propose an all-optical graph representation learning architecture, termed diffractive graph neural network (DGNN), based on the integrated diffractive photonic computing units (DPUs) to address this limitation. Specifically, DGNN optically encodes node attributes into strip optical waveguides, which are transformed by DPUs and aggregated by on-chip optical couplers to extract their feature representations. Each DPU comprises successive passive layers of metalines to modulate the electromagnetic optical field via diffraction, where the metaline structures are learnable parameters shared across graph nodes. DGNN captures complex dependencies among the node neighborhoods and eliminates the nonlinear transition functions during the light-speed optical message passing over graph structures. We demonstrate the use of DGNN extracted features for node and graph-level classification tasks with benchmark databases and achieve superior performance. Our work opens up a new direction for designing application-specific integrated photonic circuits for high-efficiency processing of large-scale graph data structures using deep learning.
Large-scale e-commercial platforms in the real-world usually contain various recommendation scenarios (domains) to meet demands of diverse customer groups. Multi-Domain Recommendation (MDR), which aims to jointly improve recommendations on all domains, has attracted increasing attention from practitioners and researchers. Existing MDR methods often employ a shared structure to leverage reusable features for all domains and several specific parts to capture domain-specific information. However, data from different domains may conflict with each other and cause shared parameters to stay at a compromised position on the optimization landscape. This could deteriorate the overall performance. Despite the specific parameters are separately learned for each domain, they can easily overfit on data sparsity domains. Furthermore, data distribution differs across domains, making it challenging to develop a general model that can be applied to all circumstances. To address these problems, we propose a novel model agnostic learning method, namely MAMDR, for the multi-domain recommendation. Specifically, we first propose a Domain Negotiation (DN) strategy to alleviate the conflict between domains and learn better shared parameters. Then, we develop a Domain Regularization (DR) scheme to improve the generalization ability of specific parameters by learning from other domains. Finally, we integrate these components into a unified framework and present MAMDR which can be applied to any model structure to perform multi-domain recommendation. Extensive experiments on various real-world datasets and online applications demonstrate both the effectiveness and generalizability of MAMDR.
* This paper has been submitted to KDD 2022 ADS Track