Alert button
Picture for Jia Liu

Jia Liu

Alert button

Hierarchical Conditional Semi-Paired Image-to-Image Translation For Multi-Task Image Defect Correction On Shopping Websites

Sep 12, 2023
Moyan Li, Jinmiao Fu, Shaoyuan Xu, Huidong Liu, Jia Liu, Bryan Wang

On shopping websites, product images of low quality negatively affect customer experience. Although there are plenty of work in detecting images with different defects, few efforts have been dedicated to correct those defects at scale. A major challenge is that there are thousands of product types and each has specific defects, therefore building defect specific models is unscalable. In this paper, we propose a unified Image-to-Image (I2I) translation model to correct multiple defects across different product types. Our model leverages an attention mechanism to hierarchically incorporate high-level defect groups and specific defect types to guide the network to focus on defect-related image regions. Evaluated on eight public datasets, our model reduces the Frechet Inception Distance (FID) by 24.6% in average compared with MoNCE, the state-of-the-art I2I method. Unlike public data, another practical challenge on shopping websites is that some paired images are of low quality. Therefore we design our model to be semi-paired by combining the L1 loss of paired data with the cycle loss of unpaired data. Tested on a shopping website dataset to correct three image defects, our model reduces (FID) by 63.2% in average compared with WS-I2I, the state-of-the art semi-paired I2I method.

* 6 pages, 6 figures, 3 tables. To be published in ICIP 2023 
Viaarxiv icon

SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection

Aug 22, 2023
Dalong Zheng, Zebin Wu, Jia Liu, Zhihui Wei

Figure 1 for SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection
Figure 2 for SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection
Figure 3 for SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection
Figure 4 for SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection

Among the current mainstream change detection networks, transformer is deficient in the ability to capture accurate low-level details, while convolutional neural network (CNN) is wanting in the capacity to understand global information and establish remote spatial relationships. Meanwhile, both of the widely used early fusion and late fusion frameworks are not able to well learn complete change features. Therefore, based on swin transformer V2 (Swin V2) and VGG16, we propose an end-to-end compounded dense network SwinV2DNet to inherit the advantages of both transformer and CNN and overcome the shortcomings of existing networks in feature learning. Firstly, it captures the change relationship features through the densely connected Swin V2 backbone, and provides the low-level pre-changed and post-changed features through a CNN branch. Based on these three change features, we accomplish accurate change detection results. Secondly, combined with transformer and CNN, we propose mixed feature pyramid (MFP) which provides inter-layer interaction information and intra-layer multi-scale information for complete feature learning. MFP is a plug and play module which is experimentally proven to be also effective in other change detection networks. Further more, we impose a self-supervision strategy to guide a new CNN branch, which solves the untrainable problem of the CNN branch and provides the semantic change information for the features of encoder. The state-of-the-art (SOTA) change detection scores and fine-grained change maps were obtained compared with other advanced methods on four commonly used public remote sensing datasets. The code is available at https://github.com/DalongZ/SwinV2DNet.

Viaarxiv icon

Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier

Jul 22, 2023
Zhixing Zhang, Ziwei Zhao, Dong Wang, Shishuang Zhao, Yuhang Liu, Jia Liu, Liwei Wang

Figure 1 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier
Figure 2 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier
Figure 3 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier
Figure 4 for Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier

Automatic labeling of coronary arteries is an essential task in the practical diagnosis process of cardiovascular diseases. For experienced radiologists, the anatomically predetermined connections are important for labeling the artery segments accurately, while this prior knowledge is barely explored in previous studies. In this paper, we present a new framework called TopoLab which incorporates the anatomical connections into the network design explicitly. Specifically, the strategies of intra-segment feature aggregation and inter-segment feature interaction are introduced for hierarchical segment feature extraction. Moreover, we propose the anatomy-aware connection classifier to enable classification for each connected segment pair, which effectively exploits the prior topology among the arteries with different categories. To validate the effectiveness of our method, we contribute high-quality annotations of artery labeling to the public orCaScore dataset. The experimental results on both the orCaScore dataset and an in-house dataset show that our TopoLab has achieved state-of-the-art performance.

* Accepted by MICCAI 2023 
Viaarxiv icon

Geometric Pooling: maintaining more useful information

Jun 21, 2023
Hao Xu, Jia Liu, Yang Shen, Kenan Lou, Yanxia Bao, Ruihua Zhang, Shuyue Zhou, Hongsen Zhao, Shuai Wang

Figure 1 for Geometric Pooling: maintaining more useful information
Figure 2 for Geometric Pooling: maintaining more useful information
Figure 3 for Geometric Pooling: maintaining more useful information
Figure 4 for Geometric Pooling: maintaining more useful information

Graph Pooling technology plays an important role in graph node classification tasks. Sorting pooling technologies maintain large-value units for pooling graphs of varying sizes. However, by analyzing the statistical characteristic of activated units after pooling, we found that a large number of units dropped by sorting pooling are negative-value units that contain useful information and can contribute considerably to the final decision. To maintain more useful information, a novel pooling technology, called Geometric Pooling (GP), was proposed to contain the unique node features with negative values by measuring the similarity of all node features. We reveal the effectiveness of GP from the entropy reduction view. The experiments were conducted on TUdatasets to show the effectiveness of GP. The results showed that the proposed GP outperforms the SOTA graph pooling technologies by 1%\sim5% with fewer parameters.

* 6 pages, 4 figures 
Viaarxiv icon

AdaSelection: Accelerating Deep Learning Training through Data Subsampling

Jun 19, 2023
Minghe Zhang, Chaosheng Dong, Jinmiao Fu, Tianchen Zhou, Jia Liang, Jia Liu, Bo Liu, Michinari Momma, Bryan Wang, Yan Gao, Yi Sun

Figure 1 for AdaSelection: Accelerating Deep Learning Training through Data Subsampling
Figure 2 for AdaSelection: Accelerating Deep Learning Training through Data Subsampling
Figure 3 for AdaSelection: Accelerating Deep Learning Training through Data Subsampling
Figure 4 for AdaSelection: Accelerating Deep Learning Training through Data Subsampling

In this paper, we introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch to speed up the training of large-scale deep learning models without sacrificing model performance. Our method is able to flexibly combines an arbitrary number of baseline sub-sampling methods incorporating the method-level importance and intra-method sample-level importance at each iteration. The standard practice of ad-hoc sampling often leads to continuous training with vast amounts of data from production environments. To improve the selection of data instances during forward and backward passes, we propose recording a constant amount of information per instance from these passes. We demonstrate the effectiveness of our method by testing it across various types of inputs and tasks, including the classification tasks on both image and language datasets, as well as regression tasks. Compared with industry-standard baselines, AdaSelection consistently displays superior performance.

Viaarxiv icon

Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Jun 12, 2023
Xinrui Zhou, Yuhao Huang, Wufeng Xue, Xin Yang, Yuxin Zou, Qilong Ying, Yuanji Zhang, Jia Liu, Jie Ren, Dong Ni

Figure 1 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos
Figure 2 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos
Figure 3 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos
Figure 4 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Localization of the narrowest position of the vessel and corresponding vessel and remnant vessel delineation in carotid ultrasound (US) are essential for carotid stenosis grading (CSG) in clinical practice. However, the pipeline is time-consuming and tough due to the ambiguous boundaries of plaque and temporal variation. To automatize this procedure, a large number of manual delineations are usually required, which is not only laborious but also not reliable given the annotation difficulty. In this study, we present the first video classification framework for automatic CSG. Our contribution is three-fold. First, to avoid the requirement of laborious and unreliable annotation, we propose a novel and effective video classification network for weakly-supervised CSG. Second, to ease the model training, we adopt an inflation strategy for the network, where pre-trained 2D convolution weights can be adapted into the 3D counterpart in our network for an effective warm start. Third, to enhance the feature discrimination of the video, we propose a novel attention-guided multi-dimension fusion (AMDF) transformer encoder to model and integrate global dependencies within and across spatial and temporal dimensions, where two lightweight cross-dimensional attention mechanisms are designed. Our approach is extensively validated on a large clinically collected carotid US video dataset, demonstrating state-of-the-art performance compared with strong competitors.

* Accepted by MICCAI 2023 
Viaarxiv icon

Knowing-how & Knowing-that: A New Task for Machine Reading Comprehension of User Manuals

Jun 07, 2023
Hongru Liang, Jia Liu, Weihong Du, dingnan jin, Wenqiang Lei, Zujie Wen, Jiancheng Lv

Figure 1 for Knowing-how & Knowing-that: A New Task for Machine Reading Comprehension of User Manuals
Figure 2 for Knowing-how & Knowing-that: A New Task for Machine Reading Comprehension of User Manuals
Figure 3 for Knowing-how & Knowing-that: A New Task for Machine Reading Comprehension of User Manuals
Figure 4 for Knowing-how & Knowing-that: A New Task for Machine Reading Comprehension of User Manuals

The machine reading comprehension (MRC) of user manuals has huge potential in customer service. However,current methods have trouble answering complex questions. Therefore, we introduce the Knowing-how & Knowing-that task that requires the model to answer factoid-style, procedure-style, and inconsistent questions about user manuals. We resolve this task by jointly representing the steps and facts in a graph (TARA), which supports a unified inference of various questions. Towards a systematical benchmarking study, we design a heuristic method to automatically parse user manuals into TARAs and build an annotated dataset to test the model's ability in answering real-world questions. Empirical results demonstrate that representing user manuals as TARAs is a desired solution for the MRC of user manuals. An in-depth investigation of TARA further sheds light on the issues and broader impacts of future representations of user manuals. We hope our work can move the MRC of user manuals to a more complex and realistic stage.

Viaarxiv icon

Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Mar 31, 2023
Anbai Jiang, Wei-Qiang Zhang, Yufeng Deng, Pingyi Fan, Jia Liu

Figure 1 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach
Figure 2 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach
Figure 3 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach
Figure 4 for Unsupervised Anomaly Detection and Localization of Machine Audio: A GAN-based Approach

Automatic detection of machine anomaly remains challenging for machine learning. We believe the capability of generative adversarial network (GAN) suits the need of machine audio anomaly detection, yet rarely has this been investigated by previous work. In this paper, we propose AEGAN-AD, a totally unsupervised approach in which the generator (also an autoencoder) is trained to reconstruct input spectrograms. It is pointed out that the denoising nature of reconstruction deprecates its capacity. Thus, the discriminator is redesigned to aid the generator during both training stage and detection stage. The performance of AEGAN-AD on the dataset of DCASE 2022 Challenge TASK 2 demonstrates the state-of-the-art result on five machine types. A novel anomaly localization method is also investigated. Source code available at: www.github.com/jianganbai/AEGAN-AD

* Accepted by ICASSP 2023 
Viaarxiv icon

Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

Mar 21, 2023
Haisong Liu, Tao Lu, Yihui Xu, Jia Liu, Limin Wang

Figure 1 for Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
Figure 2 for Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
Figure 3 for Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
Figure 4 for Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline that splits the joint task into independent stages, or fuse 2D and 3D information in an ``early-fusion'' or ``late-fusion'' manner. Such one-size-fits-all approaches suffer from a dilemma of failing to fully utilize the characteristic of each modality or to maximize the inter-modality complementarity. To address the problem, we propose a novel end-to-end framework, which consists of 2D and 3D branches with multiple bidirectional fusion connections between them in specific layers. Different from previous work, we apply a point-based 3D branch to extract the LiDAR features, as it preserves the geometric structure of point clouds. To fuse dense image features and sparse point features, we propose a learnable operator named bidirectional camera-LiDAR fusion module (Bi-CLFM). We instantiate two types of the bidirectional fusion pipeline, one based on the pyramidal coarse-to-fine architecture (dubbed CamLiPWC), and the other one based on the recurrent all-pairs field transforms (dubbed CamLiRAFT). On FlyingThings3D, both CamLiPWC and CamLiRAFT surpass all existing methods and achieve up to a 47.9\% reduction in 3D end-point-error from the best published result. Our best-performing model, CamLiRAFT, achieves an error of 4.26\% on the KITTI Scene Flow benchmark, ranking 1st among all submissions with much fewer parameters. Besides, our methods have strong generalization performance and the ability to handle non-rigid motion. Code is available at https://github.com/MCG-NJU/CamLiFlow.

* arXiv admin note: text overlap with arXiv:2111.10502 
Viaarxiv icon

PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

Mar 05, 2023
Zhuqing Liu, Xin Zhang, Songtao Lu, Jia Liu

Figure 1 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities
Figure 2 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities
Figure 3 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities
Figure 4 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

Recently, min-max optimization problems have received increasing attention due to their wide range of applications in machine learning (ML). However, most existing min-max solution techniques are either single-machine or distributed algorithms coordinated by a central server. In this paper, we focus on the decentralized min-max optimization for learning with domain constraints, where multiple agents collectively solve a nonconvex-strongly-concave min-max saddle point problem without coordination from any server. Decentralized min-max optimization problems with domain constraints underpins many important ML applications, including multi-agent ML fairness assurance, and policy evaluations in multi-agent reinforcement learning. We propose an algorithm called PRECISION (proximal gradient-tracking and stochastic recursive variance reduction) that enjoys a convergence rate of $O(1/T)$, where $T$ is the maximum number of iterations. To further reduce sample complexity, we propose PRECISION$^+$ with an adaptive batch size technique. We show that the fast $O(1/T)$ convergence of PRECISION and PRECISION$^+$ to an $\epsilon$-stationary point imply $O(\epsilon^{-2})$ communication complexity and $O(m\sqrt{n}\epsilon^{-2})$ sample complexity, where $m$ is the number of agents and $n$ is the size of dataset at each agent. To our knowledge, this is the first work that achieves $O(\epsilon^{-2})$ in both sample and communication complexities in decentralized min-max learning with domain constraints. Our experiments also corroborate the theoretical results.

Viaarxiv icon