Alert button
Picture for Hanwen Liu

Hanwen Liu

Alert button

XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision

Oct 31, 2023
Daniel Hajialigol, Hanwen Liu, Xuan Wang

Figure 1 for XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision
Figure 2 for XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision
Figure 3 for XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision
Figure 4 for XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision

Text classification aims to effectively categorize documents into pre-defined categories. Traditional methods for text classification often rely on large amounts of manually annotated training data, making the process time-consuming and labor-intensive. To address this issue, recent studies have focused on weakly-supervised and extremely weakly-supervised settings, which require minimal or no human annotation, respectively. In previous methods of weakly supervised text classification, pseudo-training data is generated by assigning pseudo-labels to documents based on their alignment (e.g., keyword matching) with specific classes. However, these methods ignore the importance of incorporating the explanations of the generated pseudo-labels, or saliency of individual words, as additional guidance during the text classification training process. To address this limitation, we propose XAI-CLASS, a novel explanation-enhanced extremely weakly-supervised text classification method that incorporates word saliency prediction as an auxiliary task. XAI-CLASS begins by employing a multi-round question-answering process to generate pseudo-training data that promotes the mutual enhancement of class labels and corresponding explanation word generation. This pseudo-training data is then used to train a multi-task framework that simultaneously learns both text classification and word saliency prediction. Extensive experiments on several weakly-supervised text classification datasets show that XAI-CLASS outperforms other weakly-supervised text classification methods significantly. Moreover, experiments demonstrate that XAI-CLASS enhances both model performance and explainability.

Viaarxiv icon

TableQAKit: A Comprehensive and Practical Toolkit for Table-based Question Answering

Oct 23, 2023
Fangyu Lei, Tongxu Luo, Pengqi Yang, Weihao Liu, Hanwen Liu, Jiahe Lei, Yiming Huang, Yifan Wei, Shizhu He, Jun Zhao, Kang Liu

Table-based question answering (TableQA) is an important task in natural language processing, which requires comprehending tables and employing various reasoning ways to answer the questions. This paper introduces TableQAKit, the first comprehensive toolkit designed specifically for TableQA. The toolkit designs a unified platform that includes plentiful TableQA datasets and integrates popular methods of this task as well as large language models (LLMs). Users can add their datasets and methods according to the friendly interface. Also, pleasantly surprised using the modules in this toolkit achieves new SOTA on some datasets. Finally, \tableqakit{} also provides an LLM-based TableQA Benchmark for evaluating the role of LLMs in TableQA. TableQAKit is open-source with an interactive interface that includes visual operations, and comprehensive data for ease of use.

* Work in progress 
Viaarxiv icon

Back-Projection Pipeline

Jan 25, 2021
Pablo Navarrete Michelini, Hanwen Liu, Yunhua Lu, Xingqun Jiang

Figure 1 for Back-Projection Pipeline
Figure 2 for Back-Projection Pipeline
Figure 3 for Back-Projection Pipeline
Figure 4 for Back-Projection Pipeline

We propose a simple extension of residual networks that works simultaneously in multiple resolutions. Our network design is inspired by the iterative back-projection algorithm but seeks the more difficult task of learning how to enhance images. Compared to similar approaches, we propose a novel solution to make back-projections run in multiple resolutions by using a data pipeline workflow. Features are updated at multiple scales in each layer of the network. The update dynamic through these layers includes interactions between different resolutions in a way that is causal in scale, and it is represented by a system of ODEs, as opposed to a single ODE in the case of ResNets. The system can be used as a generic multi-resolution approach to enhance images. We test it on several challenging tasks with special focus on super-resolution and raindrop removal. Our results are competitive with state-of-the-arts and show a strong ability of our system to learn both global and local image features.

Viaarxiv icon

Multi-Grid Back-Projection Networks

Jan 01, 2021
Pablo Navarrete Michelini, Wenbin Chen, Hanwen Liu, Dan Zhu, Xingqun Jiang

Figure 1 for Multi-Grid Back-Projection Networks
Figure 2 for Multi-Grid Back-Projection Networks
Figure 3 for Multi-Grid Back-Projection Networks
Figure 4 for Multi-Grid Back-Projection Networks

Multi-Grid Back-Projection (MGBP) is a fully-convolutional network architecture that can learn to restore images and videos with upscaling artifacts. Using the same strategy of multi-grid partial differential equation (PDE) solvers this multiscale architecture scales computational complexity efficiently with increasing output resolutions. The basic processing block is inspired in the iterative back-projection (IBP) algorithm and constitutes a type of cross-scale residual block with feedback from low resolution references. The architecture performs in par with state-of-the-arts alternatives for regression targets that aim to recover an exact copy of a high resolution image or video from which only a downscale image is known. A perceptual quality target aims to create more realistic outputs by introducing artificial changes that can be different from a high resolution original content as long as they are consistent with the low resolution input. For this target we propose a strategy using noise inputs in different resolution scales to control the amount of artificial details generated in the output. The noise input controls the amount of innovation that the network uses to create artificial realistic details. The effectiveness of this strategy is shown in benchmarks and it is explained as a particular strategy to traverse the perception-distortion plane.

* Accepted for publication in IEEE Journal of Selected Topics in Signal Processing (J-STSP). arXiv admin note: text overlap with arXiv:1809.10711 
Viaarxiv icon

Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences

Dec 20, 2020
Guangming Wang, Hanwen Liu, Muyao Chen, Yehui Yang, Zhe Liu, Hesheng Wang

Figure 1 for Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences
Figure 2 for Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences
Figure 3 for Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences
Figure 4 for Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences

Recently, learning based methods for the robot perception from the image or video have much developed, but deep learning methods for dynamic 3D point cloud sequences are underexplored. With the widespread application of 3D sensors such as LiDAR and depth camera, efficient and accurate perception of the 3D environment from 3D sequence data is pivotal to autonomous driving and service robots. An Anchor-based Spatial-Temporal Attention Convolution operation (ASTAConv) is proposed in this paper to process dynamic 3D point cloud sequences. The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point. The features of neighborhood points are firstly aggregated to each anchor based on spatial-temporal attention mechanism. Then, anchor-based sparse 3D convolution is adopted to aggregate the features of these anchors to the core points. The proposed method makes better use of the structured information within the local region, and learn spatial-temporal embedding features from dynamic 3D point cloud sequences. Then Anchor-based Spatial-Temporal Attention Convolutional Neural Networks (ASTACNNs) are proposed for classification and segmentation tasks and are evaluated on action recognition and semantic segmentation tasks. The experimental results on MSRAction3D and Synthia datasets demonstrate that the higher accuracy can be achieved than the previous state-of-the-art method by our novel strategy of multi-frame fusion.

* 10 pages, 7 figures, under review 
Viaarxiv icon

MGBPv2: Scaling Up Multi-Grid Back-Projection Networks

Sep 27, 2019
Pablo Navarrete Michelini, Wenbin Chen, Hanwen Liu, Dan Zhu

Figure 1 for MGBPv2: Scaling Up Multi-Grid Back-Projection Networks
Figure 2 for MGBPv2: Scaling Up Multi-Grid Back-Projection Networks
Figure 3 for MGBPv2: Scaling Up Multi-Grid Back-Projection Networks
Figure 4 for MGBPv2: Scaling Up Multi-Grid Back-Projection Networks

Here, we describe our solution for the AIM-2019 Extreme Super-Resolution Challenge, where we won the 1st place in terms of perceptual quality (MOS) similar to the ground truth and achieved the 5th place in terms of high-fidelity (PSNR). To tackle this challenge, we introduce the second generation of MultiGrid BackProjection networks (MGBPv2) whose major modifications make the system scalable and more general than its predecessor. It combines the scalability of the multigrid algorithm and the performance of iterative backprojections. In its original form, MGBP is limited to a small number of parameters due to a strongly recursive structure. In MGBPv2, we make full use of the multigrid recursion from the beginning of the network; we allow different parameters in every module of the network; we simplify the main modules; and finally, we allow adjustments of the number of network features based on the scale of operation. For inference tasks, we introduce an overlapping patch approach to further allow processing of very large images (e.g. 8K). Our training strategies make use of a multiscale loss, combining distortion and/or perception losses on the output as well as downscaled output images. The final system can balance between high quality and high performance.

* In ICCV 2019 Workshops. Winner of Perceptual track in AIM Extreme Super-Resolution Challenge 2019. Code available at https://github.com/pnavarre/mgbpv2 
Viaarxiv icon

A Tour of Convolutional Networks Guided by Linear Interpreters

Aug 14, 2019
Pablo Navarrete Michelini, Hanwen Liu, Yunhua Lu, Xingqun Jiang

Figure 1 for A Tour of Convolutional Networks Guided by Linear Interpreters
Figure 2 for A Tour of Convolutional Networks Guided by Linear Interpreters
Figure 3 for A Tour of Convolutional Networks Guided by Linear Interpreters
Figure 4 for A Tour of Convolutional Networks Guided by Linear Interpreters

Convolutional networks are large linear systems divided into layers and connected by non-linear units. These units are the "articulations" that allow the network to adapt to the input. To understand how a network manages to solve a problem we must look at the articulated decisions in entirety. If we could capture the actions of non-linear units for a particular input, we would be able to replay the whole system back and forth as if it was always linear. It would also reveal the actions of non-linearities because the resulting linear system, a Linear Interpreter, depends on the input image. We introduce a hooking layer, called a LinearScope, which allows us to run the network and the linear interpreter in parallel. Its implementation is simple, flexible and efficient. From here we can make many curious inquiries: how do these linear systems look like? When the rows and columns of the transformation matrix are images, how do they look like? What type of basis do these linear transformations rely on? The answers depend on the problems presented, through which we take a tour to some popular architectures used for classification, super-resolution (SR) and image-to-image translation (I2I). For classification we observe that popular networks use a pixel-wise vote per class strategy and heavily rely on bias parameters. For SR and I2I we find that CNNs use wavelet-type basis similar to the human visual system. For I2I we reveal copy-move and template-creation strategies to generate outputs.

* To appear in ICCV 2019 
Viaarxiv icon

Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling

May 22, 2017
Pablo Navarrete Michelini, Hanwen Liu

Figure 1 for Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling
Figure 2 for Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling
Figure 3 for Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling
Figure 4 for Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling

We interpret convolutional networks as adaptive filters and combine them with so-called MuxOut layers to efficiently upscale low resolution images. We formalize this interpretation by deriving a linear and space-variant structure of a convolutional network when its activations are fixed. We introduce general purpose algorithms to analyze a network and show its overall filter effect for each given location. We use this analysis to evaluate two types of image upscalers: deterministic upscalers that target the recovery of details from original content; and second, a new generation of upscalers that can sample the distribution of upscale aliases (images that share the same downscale version) that look like real content.

Viaarxiv icon