Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Lin

LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

May 17, 2022

Zhicheng Yang, Jinghui Qin, Jiaqi Chen, Liang Lin, Xiaodan Liang

Figure 1 for LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

Figure 2 for LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

Figure 3 for LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

Figure 4 for LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning

Abstract:Recently, deep learning models have made great progress in MWP solving on answer accuracy. However, they are uninterpretable since they mainly rely on shallow heuristics to achieve high performance without understanding and reasoning the grounded math logic. To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11,495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation. Different from existing MWP datasets, our InterMWP benchmark asks for a solver to not only output the solution expressions but also predict the corresponding logical formulas. We further propose a novel approach with logical prompt and interpretation generation, called LogicSolver. For each MWP, our LogicSolver first retrieves some highly-correlated algebraic knowledge and then passes them to the backbone model as prompts to improve the semantic representations of MWPs. With these improved semantic representations, our LogicSolver generates corresponding solution expressions and interpretable knowledge formulas in accord with the generated solution expressions, simultaneously. Experimental results show that our LogicSolver has stronger logical formula-based interpretability than baselines while achieving higher answer accuracy with the help of logical prompts, simultaneously.

Via

Access Paper or Ask Questions

Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

May 07, 2022

Xiaoqian Xu, Pengxu Wei, Weikai Chen, Mingzhi Mao, Liang Lin, Guanbin Li

Figure 1 for Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

Figure 2 for Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

Figure 3 for Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

Figure 4 for Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution

Abstract:Due to the sophisticated imaging process, an identical scene captured by different cameras could exhibit distinct imaging patterns, introducing distinct proficiency among the super-resolution (SR) models trained on images from different devices. In this paper, we investigate a novel and practical task coded cross-device SR, which strives to adapt a real-world SR model trained on the paired images captured by one camera to low-resolution (LR) images captured by arbitrary target devices. The proposed task is highly challenging due to the absence of paired data from various imaging devices. To address this issue, we propose an unsupervised domain adaptation mechanism for real-world SR, named Dual ADversarial Adaptation (DADA), which only requires LR images in the target domain with available real paired data from a source camera. DADA employs the Domain-Invariant Attention (DIA) module to establish the basis of target model training even without HR supervision. Furthermore, the dual framework of DADA facilitates an Inter-domain Adversarial Adaptation (InterAA) in one branch for two LR input images from two domains, and an Intra-domain Adversarial Adaptation (IntraAA) in two branches for an LR input image. InterAA and IntraAA together improve the model transferability from the source domain to the target. We empirically conduct experiments under six Real to Real adaptation settings among three different cameras, and achieve superior performance compared with existing state-of-the-art approaches. We also evaluate the proposed DADA to address the adaptation to the video camera, which presents a promising research topic to promote the wide applications of real-world super-resolution. Our source code is publicly available at https://github.com/lonelyhope/DADA.git.

Via

Access Paper or Ask Questions

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

May 06, 2022

Binbin Yang, Xinchi Deng, Han Shi, Changlin Li, Gengwei Zhang, Hang Xu, Shen Zhao, Liang Lin, Xiaodan Liang

Figure 1 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

Figure 2 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

Figure 3 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

Figure 4 for Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism

Abstract:Continual learning is a challenging real-world problem for constructing a mature AI system when data are provided in a streaming fashion. Despite recent progress in continual classification, the researches of continual object detection are impeded by the diverse sizes and numbers of objects in each image. Different from previous works that tune the whole network for all tasks, in this work, we present a simple and flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTing mechAnism (ROSETTA). Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks. In this way, various knowledge can be successively memorized by storing their corresponding sub-model weights in this system. To make ROSETTA automatically determine which experience is available and useful, a prototypical task correlation guided Gating Diversity Controller(GDC) is introduced to adaptively adjust the diversity of gates for the new task based on class-specific prototypes. GDC module computes class-to-class correlation matrix to depict the cross-task correlation, and hereby activates more exclusive gates for the new task if a significant domain gap is observed. Comprehensive experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance on both task-based and class-based continual object detection.

Via

Access Paper or Ask Questions

Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study

May 05, 2022

Yang Liu, Yushen Wei, Hong Yan, Guanbin Li, Liang Lin

Figure 1 for Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study

Figure 2 for Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study

Figure 3 for Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study

Figure 4 for Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study

Abstract:Spatial-temporal representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal data in big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks an unified guidance and analysis about why modern spatial-temporal representation learning methods are easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for spatial-temporal representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some primary challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in spatial-temporal representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable spatial-temporal representation learning and related real-world applications more efficiently.

* This work has been submitted to Machine Intelligence Research

Via

Access Paper or Ask Questions

Audio-Visual Contrastive Learning for Self-supervised Action Recognition

Apr 28, 2022

Haoyuan Lan, Yang Liu, Liang Lin

Figure 1 for Audio-Visual Contrastive Learning for Self-supervised Action Recognition

Figure 2 for Audio-Visual Contrastive Learning for Self-supervised Action Recognition

Figure 3 for Audio-Visual Contrastive Learning for Self-supervised Action Recognition

Figure 4 for Audio-Visual Contrastive Learning for Self-supervised Action Recognition

Abstract:The underlying correlation between audio and visual modalities within videos can be utilized to learn supervised information for unlabeled videos. In this paper, we present an end-to-end self-supervised framework named Audio-Visual Contrastive Learning (AVCL), to learn discriminative audio-visual representations for action recognition. Specifically, we design an attention based multi-modal fusion module (AMFM) to fuse audio and visual modalities. To align heterogeneous audio-visual modalities, we construct a novel co-correlation guided representation alignment module (CGRA). To learn supervised information from unlabeled videos, we propose a novel self-supervised contrastive learning module (SelfCL). Furthermore, to expand the existing audio-visual action recognition datasets and better evaluate our framework AVCL, we build a new audio-visual action recognition dataset named Kinetics-Sounds100. Experimental results on Kinetics-Sounds32 and Kinetics-Sounds100 datasets demonstrate the superiority of our AVCL over the state-of-the-art methods on large-scale action recognition benchmark.

Via

Access Paper or Ask Questions

Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Apr 08, 2022

Tao Pu, Lixian Yuan, Hefeng Wu, Tianshui Chen, Ling Tian, Liang Lin

Figure 1 for Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Figure 2 for Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Figure 3 for Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Figure 4 for Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Abstract:Recently many multi-label image recognition (MLR) works have made significant progress by introducing pre-trained object detection models to generate lots of proposals or utilizing statistical label co-occurrence enhance the correlation among different categories. However, these works have some limitations: (1) the effectiveness of the network significantly depends on pre-trained object detection models that bring expensive and unaffordable computation; (2) the network performance degrades when there exist occasional co-occurrence objects in images, especially for the rare categories. To address these problems, we propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category and capture semantic dependency among all categories. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model to focus on semantic-aware regions. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions to regularize the network training. Extensive experiments and comparisons on two popular MLR benchmark datasets (i.e., MS-COCO and Pascal VOC 2007) demonstrate the effectiveness of the proposed framework over current state-of-the-art algorithms.

* 25 pages, 7 figures

Via

Access Paper or Ask Questions

Open Set Domain Adaptation By Novel Class Discovery

Mar 07, 2022

Jingyu Zhuang, Ziliang Chen, Pengxu Wei, Guanbin Li, Liang Lin

Figure 1 for Open Set Domain Adaptation By Novel Class Discovery

Figure 2 for Open Set Domain Adaptation By Novel Class Discovery

Figure 3 for Open Set Domain Adaptation By Novel Class Discovery

Figure 4 for Open Set Domain Adaptation By Novel Class Discovery

Abstract:In Open Set Domain Adaptation (OSDA), large amounts of target samples are drawn from the implicit categories that never appear in the source domain. Due to the lack of their specific belonging, existing methods indiscriminately regard them as a single class unknown. We challenge this broadly-adopted practice that may arouse unexpected detrimental effects because the decision boundaries between the implicit categories have been fully ignored. Instead, we propose Self-supervised Class-Discovering Adapter (SCDA) that attempts to achieve OSDA by gradually discovering those implicit classes, then incorporating them to restructure the classifier and update the domain-adaptive features iteratively. SCDA performs two alternate steps to achieve implicit class discovery and self-supervised OSDA, respectively. By jointly optimizing for two tasks, SCDA achieves the state-of-the-art in OSDA and shows a competitive performance to unearth the implicit target classes.

Via

Access Paper or Ask Questions

Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Mar 04, 2022

Tao Pu, Tianshui Chen, Hefeng Wu, Liang Lin

Figure 1 for Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Figure 2 for Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Figure 3 for Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Figure 4 for Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Abstract:Training the multi-label image recognition models with partial labels, in which merely some labels are known while others are unknown for each image, is a considerably challenging and practical task. To address this task, current algorithms mainly depend on pre-training classification or similarity models to generate pseudo labels for the unknown labels. However, these algorithms depend on sufficient multi-label annotations to train the models, leading to poor performance especially with low known label proportion. In this work, we propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels, which can get rid of pre-training models and thus does not depend on sufficient annotations. To this end, we design a unified semantic-aware representation blending (SARB) framework that exploits instance-level and prototype-level semantic representation to complement unknown labels by two complementary modules: 1) an instance-level representation blending (ILRB) module blends the representations of the known labels in an image to the representations of the unknown labels in another image to complement these unknown labels. 2) a prototype-level representation blending (PLRB) module learns more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels to complement these labels. Extensive experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors on all known label proportion settings, i.e., with the mAP improvement of 4.6%, 4.%, 2.2% on these three datasets when the known label proportion is 10%. Codes are available at https://github.com/HCPLab-SYSU/HCP-MLR-PL.

* Accepted by AAAI'22

Via

Access Paper or Ask Questions

Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

Feb 26, 2022

Pengxiang Yan, Ziyi Wu, Mengmeng Liu, Kun Zeng, Liang Lin, Guanbin Li

Figure 1 for Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

Figure 2 for Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

Figure 3 for Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

Figure 4 for Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning

Abstract:Recent advances in deep learning significantly boost the performance of salient object detection (SOD) at the expense of labeling larger-scale per-pixel annotations. To relieve the burden of labor-intensive labeling, deep unsupervised SOD methods have been proposed to exploit noisy labels generated by handcrafted saliency methods. However, it is still difficult to learn accurate saliency details from rough noisy labels. In this paper, we propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations. Specifically, we first construct a novel synthetic SOD dataset by a simple copy-paste strategy. Considering the large appearance differences between the synthetic and real-world scenarios, directly training with synthetic data will lead to performance degradation on real-world scenarios. To mitigate this problem, we propose a novel unsupervised domain adaptive SOD method to adapt between these two domains by uncertainty-aware self-training. Experimental results show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets, and is even comparable to fully-supervised ones.

* Accepted by AAAI2022, code is available at https://github.com/Kinpzz/UDASOD-UPL

Via

Access Paper or Ask Questions

TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

Jan 05, 2022

Yang Liu, Keze Wang, Lingbo Liu, Haoyuan Lan, Liang Lin

Figure 1 for TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

Figure 2 for TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

Figure 3 for TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

Figure 4 for TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning

Abstract:Video self-supervised learning is a challenging task, which requires significant expressive power from the model to leverage rich spatial-temporal knowledge and generate effective supervisory signals from large amounts of unlabeled videos. However, existing methods fail to increase the temporal diversity of unlabeled videos and ignore elaborately modeling multi-scale temporal dependencies in an explicit way. To overcome these limitations, we take advantage of the multi-scale temporal dependencies within videos and proposes a novel video self-supervised learning framework named Temporal Contrastive Graph Learning (TCGL), which jointly models the inter-snippet and intra-snippet temporal dependencies for temporal representation learning with a hybrid graph contrastive learning strategy. Specifically, a Spatial-Temporal Knowledge Discovering (STKD) module is first introduced to extract motion-enhanced spatial-temporal representations from videos based on the frequency domain analysis of discrete cosine transform. To explicitly model multi-scale temporal dependencies of unlabeled videos, our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i.e., the intra-/inter- snippet Temporal Contrastive Graphs (TCG). Then, specific contrastive learning modules are designed to maximize the agreement between nodes in different graph views. To generate supervisory signals for unlabeled videos, we introduce an Adaptive Snippet Order Prediction (ASOP) module which leverages the relational knowledge among video snippets to learn the global context representation and recalibrate the channel-wise features adaptively. Experimental results demonstrate the superiority of our TCGL over the state-of-the-art methods on large-scale action recognition and video retrieval benchmarks.The code is publicly available at https://github.com/YangLiu9208/TCGL.

* This work has been submitted to the IEEE for possible publication. The code is publicly available at https://github.com/YangLiu9208/TCGL. arXiv admin note: substantial text overlap with arXiv:2101.00820

Via

Access Paper or Ask Questions