Alert button
Picture for Xi Zhang

Xi Zhang

Alert button

Robust Ranking Explanations

Jul 08, 2023
Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie

Figure 1 for Robust Ranking Explanations
Figure 2 for Robust Ranking Explanations
Figure 3 for Robust Ranking Explanations
Figure 4 for Robust Ranking Explanations

Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using $\ell_p$-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.

* Accepted to IMLH (Interpretable ML in Healthcare) workshop at ICML 2023. arXiv admin note: substantial text overlap with arXiv:2212.14106 
Viaarxiv icon

Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection

Jun 19, 2023
Mengzhu Sun, Xi Zhang, Jianqiang Ma, Sihong Xie, Yazheng Liu, Philip S. Yu

Figure 1 for Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection
Figure 2 for Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection
Figure 3 for Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection
Figure 4 for Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection

Rumor spreaders are increasingly utilizing multimedia content to attract the attention and trust of news consumers. Though quite a few rumor detection models have exploited the multi-modal data, they seldom consider the inconsistent semantics between images and texts, and rarely spot the inconsistency among the post contents and background knowledge. In addition, they commonly assume the completeness of multiple modalities and thus are incapable of handling handle missing modalities in real-life scenarios. Motivated by the intuition that rumors in social media are more likely to have inconsistent semantics, a novel Knowledge-guided Dual-consistency Network is proposed to detect rumors with multimedia contents. It uses two consistency detection subnetworks to capture the inconsistency at the cross-modal level and the content-knowledge level simultaneously. It also enables robust multi-modal representation learning under different missing visual modality conditions, using a special token to discriminate between posts with visual modality and posts without visual modality. Extensive experiments on three public real-world multimedia datasets demonstrate that our framework can outperform the state-of-the-art baselines under both complete and incomplete modality conditions. Our codes are available at https://github.com/MengzSun/KDCN.

* IEEE Transactions on Knowledge and Data Engineering, 2023  
Viaarxiv icon

HR-NeuS: Recovering High-Frequency Surface Geometry via Neural Implicit Surfaces

Feb 14, 2023
Erich Liang, Kenan Deng, Xi Zhang, Chun-Kai Wang

Figure 1 for HR-NeuS: Recovering High-Frequency Surface Geometry via Neural Implicit Surfaces
Figure 2 for HR-NeuS: Recovering High-Frequency Surface Geometry via Neural Implicit Surfaces
Figure 3 for HR-NeuS: Recovering High-Frequency Surface Geometry via Neural Implicit Surfaces
Figure 4 for HR-NeuS: Recovering High-Frequency Surface Geometry via Neural Implicit Surfaces

Recent advances in neural implicit surfaces for multi-view 3D reconstruction primarily focus on improving large-scale surface reconstruction accuracy, but often produce over-smoothed geometries that lack fine surface details. To address this, we present High-Resolution NeuS (HR-NeuS), a novel neural implicit surface reconstruction method that recovers high-frequency surface geometry while maintaining large-scale reconstruction accuracy. We achieve this by utilizing (i) multi-resolution hash grid encoding rather than positional encoding at high frequencies, which boosts our model's expressiveness of local geometry details; (ii) a coarse-to-fine algorithmic framework that selectively applies surface regularization to coarse geometry without smoothing away fine details; (iii) a coarse-to-fine grid annealing strategy to train the network. We demonstrate through experiments on DTU and BlendedMVS datasets that our approach produces 3D geometries that are qualitatively more detailed and quantitatively of similar accuracy compared to previous approaches.

Viaarxiv icon

Dual-layer Image Compression via Adaptive Downsampling and Spatially Varying Upconversion

Feb 13, 2023
Xi Zhang, Xiaolin Wu

Figure 1 for Dual-layer Image Compression via Adaptive Downsampling and Spatially Varying Upconversion
Figure 2 for Dual-layer Image Compression via Adaptive Downsampling and Spatially Varying Upconversion
Figure 3 for Dual-layer Image Compression via Adaptive Downsampling and Spatially Varying Upconversion
Figure 4 for Dual-layer Image Compression via Adaptive Downsampling and Spatially Varying Upconversion

Ultra high resolution (UHR) images are almost always downsampled to fit small displays of mobile end devices and upsampled to its original resolution when exhibited on very high-resolution displays. This observation motivates us on jointly optimizing operation pairs of downsampling and upsampling that are spatially adaptive to image contents for maximal rate-distortion performance. In this paper, we propose an adaptive downsampled dual-layer (ADDL) image compression system. In the ADDL compression system, an image is reduced in resolution by learned content-adaptive downsampling kernels and compressed to form a coded base layer. For decompression the base layer is decoded and upconverted to the original resolution using a deep upsampling neural network, aided by the prior knowledge of the learned adaptive downsampling kernels. We restrict the downsampling kernels to the form of Gabor filters in order to reduce the complexity of filter optimization and also reduce the amount of side information needed by the decoder for adaptive upsampling. Extensive experiments demonstrate that the proposed ADDL compression approach of jointly optimized, spatially adaptive downsampling and upconversion outperforms the state of the art image compression methods.

Viaarxiv icon

SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection

Dec 09, 2022
Qi Jiang, Hao Sun, Xi Zhang

Figure 1 for SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection
Figure 2 for SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection
Figure 3 for SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection
Figure 4 for SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detection

LiDAR and camera are two essential sensors for 3D object detection in autonomous driving. LiDAR provides accurate and reliable 3D geometry information while the camera provides rich texture with color. Despite the increasing popularity of fusing these two complementary sensors, the challenge remains in how to effectively fuse 3D LiDAR point cloud with 2D camera images. Recent methods focus on point-level fusion which paints the LiDAR point cloud with camera features in the perspective view or bird's-eye view (BEV)-level fusion which unifies multi-modality features in the BEV representation. In this paper, we rethink these previous fusion strategies and analyze their information loss and influences on geometric and semantic features. We present SemanticBEVFusion to deeply fuse camera features with LiDAR features in a unified BEV representation while maintaining per-modality strengths for 3D object detection. Our method achieves state-of-the-art performance on the large-scale nuScenes dataset, especially for challenging distant objects. The code will be made publicly available.

* The first two authors contributed equally to this work 
Viaarxiv icon

Unsupervised Scene Sketch to Photo Synthesis

Sep 06, 2022
Jiayun Wang, Sangryul Jeon, Stella X. Yu, Xi Zhang, Himanshu Arora, Yu Lou

Figure 1 for Unsupervised Scene Sketch to Photo Synthesis
Figure 2 for Unsupervised Scene Sketch to Photo Synthesis
Figure 3 for Unsupervised Scene Sketch to Photo Synthesis
Figure 4 for Unsupervised Scene Sketch to Photo Synthesis

Sketches make an intuitive and powerful visual expression as they are fast executed freehand drawings. We present a method for synthesizing realistic photos from scene sketches. Without the need for sketch and photo pairs, our framework directly learns from readily available large-scale photo datasets in an unsupervised manner. To this end, we introduce a standardization module that provides pseudo sketch-photo pairs during training by converting photos and sketches to a standardized domain, i.e. the edge map. The reduced domain gap between sketch and photo also allows us to disentangle them into two components: holistic scene structures and low-level visual styles such as color and texture. Taking this advantage, we synthesize a photo-realistic image by combining the structure of a sketch and the visual style of a reference photo. Extensive experimental results on perceptual similarity metrics and human perceptual studies show the proposed method could generate realistic photos with high fidelity from scene sketches and outperform state-of-the-art photo synthesis baselines. We also demonstrate that our framework facilitates a controllable manipulation of photo synthesis by editing strokes of corresponding sketches, delivering more fine-grained details than previous approaches that rely on region-level editing.

* ECCVW 2022  
Viaarxiv icon

Heterogeneous Information Network based Default Analysis on Banking Micro and Small Enterprise Users

May 02, 2022
Zheng Zhang, Yingsheng Ji, Jiachen Shen, Xi Zhang, Guangwen Yang

Figure 1 for Heterogeneous Information Network based Default Analysis on Banking Micro and Small Enterprise Users
Figure 2 for Heterogeneous Information Network based Default Analysis on Banking Micro and Small Enterprise Users
Figure 3 for Heterogeneous Information Network based Default Analysis on Banking Micro and Small Enterprise Users
Figure 4 for Heterogeneous Information Network based Default Analysis on Banking Micro and Small Enterprise Users

Risk assessment is a substantial problem for financial institutions that has been extensively studied both for its methodological richness and its various practical applications. With the expansion of inclusive finance, recent attentions are paid to micro and small-sized enterprises (MSEs). Compared with large companies, MSEs present a higher exposure rate to default owing to their insecure financial stability. Conventional efforts learn classifiers from historical data with elaborate feature engineering. However, the main obstacle for MSEs involves severe deficiency in credit-related information, which may degrade the performance of prediction. Besides, financial activities have diverse explicit and implicit relations, which have not been fully exploited for risk judgement in commercial banks. In particular, the observations on real data show that various relationships between company users have additional power in financial risk analysis. In this paper, we consider a graph of banking data, and propose a novel HIDAM model for the purpose. Specifically, we attempt to incorporate heterogeneous information network with rich attributes on multi-typed nodes and links for modeling the scenario of business banking service. To enhance feature representation of MSEs, we extract interactive information through meta-paths and fully exploit path information. Furthermore, we devise a hierarchical attention mechanism respectively to learn the importance of contents inside each meta-path and the importance of different metapahs. Experimental results verify that HIDAM outperforms state-of-the-art competitors on real-world banking data.

* Corrected typos 
Viaarxiv icon