Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tong Zhang

Nanjing University of Science and Technology, Nanjing, China

Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Sep 07, 2023

Jianyu Wen, Chenhao Wu, Tong Zhang, Yixuan Yu, Piotr Swierczynski

Figure 1 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Figure 2 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Figure 3 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Figure 4 for Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

Abstract:In this paper, we propose a 2-stage low-light image enhancement method called Self-Reference Deep Adaptive Curve Estimation (Self-DACE). In the first stage, we present an intuitive, lightweight, fast, and unsupervised luminance enhancement algorithm. The algorithm is based on a novel low-light enhancement curve that can be used to locally boost image brightness. We also propose a new loss function with a simplified physical model designed to preserve natural images' color, structure, and fidelity. We use a vanilla CNN to map each pixel through deep Adaptive Adjustment Curves (AAC) while preserving the local image structure. Secondly, we introduce the corresponding denoising scheme to remove the latent noise in the darkness. We approximately model the noise in the dark and deploy a Denoising-Net to estimate and remove the noise after the first stage. Exhaustive qualitative and quantitative analysis shows that our method outperforms existing state-of-the-art algorithms on multiple real-world datasets.

Via

Access Paper or Ask Questions

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

Sep 05, 2023

Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan Yao, Tong Zhang

Figure 1 for Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

Figure 2 for Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

Figure 3 for Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

Figure 4 for Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

Abstract:Modern deep learning heavily relies on large labeled datasets, which often comse with high costs in terms of both manual labeling and computational resources. To mitigate these challenges, researchers have explored the use of informative subset selection techniques, including coreset selection and active learning. Specifically, coreset selection involves sampling data with both input ($\bx$) and output ($\by$), active learning focuses solely on the input data ($\bx$). In this study, we present a theoretically optimal solution for addressing both coreset selection and active learning within the context of linear softmax regression. Our proposed method, COPS (unCertainty based OPtimal Sub-sampling), is designed to minimize the expected loss of a model trained on subsampled data. Unlike existing approaches that rely on explicit calculations of the inverse covariance matrix, which are not easily applicable to deep learning scenarios, COPS leverages the model's logits to estimate the sampling ratio. This sampling ratio is closely associated with model uncertainty and can be effectively applied to deep learning tasks. Furthermore, we address the challenge of model sensitivity to misspecification by incorporating a down-weighting approach for low-density samples, drawing inspiration from previous works. To assess the effectiveness of our proposed method, we conducted extensive empirical experiments using deep neural networks on benchmark datasets. The results consistently showcase the superior performance of COPS compared to baseline methods, reaffirming its efficacy.

Via

Access Paper or Ask Questions

Joint Salient Object Detection and Camouflaged Object Detection via Uncertainty-aware Learning

Jul 10, 2023

Aixuan Li, Jing Zhang, Yunqiu Lv, Tong Zhang, Yiran Zhong, Mingyi He, Yuchao Dai

Abstract:Salient objects attract human attention and usually stand out clearly from their surroundings. In contrast, camouflaged objects share similar colors or textures with the environment. In this case, salient objects are typically non-camouflaged, and camouflaged objects are usually not salient. Due to this inherent contradictory attribute, we introduce an uncertainty-aware learning pipeline to extensively explore the contradictory information of salient object detection (SOD) and camouflaged object detection (COD) via data-level and task-wise contradiction modeling. We first exploit the dataset correlation of these two tasks and claim that the easy samples in the COD dataset can serve as hard samples for SOD to improve the robustness of the SOD model. Based on the assumption that these two models should lead to activation maps highlighting different regions of the same input image, we further introduce a contrastive module with a joint-task contrastive learning framework to explicitly model the contradictory attributes of these two tasks. Different from conventional intra-task contrastive learning for unsupervised representation learning, our contrastive module is designed to model the task-wise correlation, leading to cross-task representation learning. To better understand the two tasks from the perspective of uncertainty, we extensively investigate the uncertainty estimation techniques for modeling the main uncertainties of the two tasks, namely task uncertainty (for SOD) and data uncertainty (for COD), and aiming to effectively estimate the challenging regions for each task to achieve difficulty-aware learning. Experimental results on benchmark datasets demonstrate that our solution leads to both state-of-the-art performance and informative uncertainty estimation.

Via

Access Paper or Ask Questions

Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

Jul 05, 2023

Xunpeng Huang, Hanze Dong, Yifan Hao, Yian Ma, Tong Zhang

Figure 1 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

Figure 2 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

Figure 3 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

Figure 4 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

Abstract:The efficacy of modern generative models is commonly contingent upon the precision of score estimation along the diffusion path, with a focus on diffusion models and their ability to generate high-quality data samples. This study delves into the potentialities of posterior sampling through reverse diffusion. An examination of the sampling literature reveals that score estimation can be transformed into a mean estimation problem via the decomposition of the transition kernel. By estimating the mean of the auxiliary distribution, the reverse diffusion process can give rise to a novel posterior sampling algorithm, which diverges from traditional gradient-based Markov Chain Monte Carlo (MCMC) methods. We provide the convergence analysis in total variation distance and demonstrate that the isoperimetric dependency of the proposed algorithm is comparatively lower than that observed in conventional MCMC techniques, which justifies the superior performance for high dimensional sampling with error tolerance. Our analytical framework offers fresh perspectives on the complexity of score estimation at various time points, as denoted by the properties of the auxiliary distribution.

Via

Access Paper or Ask Questions

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Jun 21, 2023

Shizhe Diao, Rui Pan, Hanze Dong, Ka Shun Shum, Jipeng Zhang, Wei Xiong, Tong Zhang

Figure 1 for LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Figure 2 for LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Figure 3 for LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Figure 4 for LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Abstract:Large foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. As the technique keeps attracting attention from the AI community, more and more large foundation models have become publically available. However, most of those models exhibit a major deficiency in specialized-task applications, where the step of finetuning is still required for obtaining satisfactory performance. As the number of available models and specialized tasks keeps growing, the job of general finetuning becomes highly nontrivial. In this paper, we take the first step to address this issue. We introduce an extensible and lightweight toolkit, LMFlow, which aims to simplify the finetuning and inference of general large foundation models. LMFlow offers a complete finetuning workflow for a large foundation model to support personalized training with limited computing resources. Furthermore, it supports continuous pretraining, instruction tuning, parameter-efficient finetuning, alignment tuning, and large model inference, along with carefully designed and extensible APIs. This toolkit has been thoroughly tested and is available at https://github.com/OptimalScale/LMFlow.

* 13 pages, 3 figures

Via

Access Paper or Ask Questions

A Universal Semantic-Geometric Representation for Robotic Manipulation

Jun 18, 2023

Tong Zhang, Yingdong Hu, Hanchen Cui, Hang Zhao, Yang Gao

Figure 1 for A Universal Semantic-Geometric Representation for Robotic Manipulation

Figure 2 for A Universal Semantic-Geometric Representation for Robotic Manipulation

Figure 3 for A Universal Semantic-Geometric Representation for Robotic Manipulation

Figure 4 for A Universal Semantic-Geometric Representation for Robotic Manipulation

Abstract:Robots rely heavily on sensors, especially RGB and depth cameras, to perceive and interact with the world. RGB cameras record 2D images with rich semantic information while missing precise spatial information. On the other side, depth cameras offer critical 3D geometry data but capture limited semantics. Therefore, integrating both modalities is crucial for learning representations for robotic perception and control. However, current research predominantly focuses on only one of these modalities, neglecting the benefits of incorporating both. To this end, we present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning. Our experiments demonstrate that SGR empowers the agent to successfully complete a diverse range of simulated and real-world robotic manipulation tasks, outperforming state-of-the-art methods significantly in both single-task and multi-task settings. Furthermore, SGR possesses the unique capability to generalize to novel semantic attributes, setting it apart from the other methods.

Via

Access Paper or Ask Questions

Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Jun 18, 2023

Yifan Zhao, Tong Zhang, Jia Li, Yonghong Tian

Figure 1 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Figure 2 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Figure 3 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Figure 4 for Dual Adaptive Representation Alignment for Cross-domain Few-shot Learning

Abstract:Few-shot learning aims to recognize novel queries with limited support samples by learning from base knowledge. Recent progress in this setting assumes that the base knowledge and novel query samples are distributed in the same domains, which are usually infeasible for realistic applications. Toward this issue, we propose to address the cross-domain few-shot learning problem where only extremely few samples are available in target domains. Under this realistic setting, we focus on the fast adaptation capability of meta-learners by proposing an effective dual adaptive representation alignment approach. In our approach, a prototypical feature alignment is first proposed to recalibrate support instances as prototypes and reproject these prototypes with a differentiable closed-form solution. Therefore feature spaces of learned knowledge can be adaptively transformed to query spaces by the cross-instance and cross-prototype relations. Besides the feature alignment, we further present a normalized distribution alignment module, which exploits prior statistics of query samples for solving the covariant shifts among the support and query samples. With these two modules, a progressive meta-learning framework is constructed to perform the fast adaptation with extremely few-shot samples while maintaining its generalization capabilities. Experimental evidence demonstrates our approach achieves new state-of-the-art results on 4 CDFSL benchmarks and 4 fine-grained cross-domain benchmarks.

* 13 pages; Accepted by IEEE T-PAMI

Via

Access Paper or Ask Questions

Structure-Sensitive Graph Dictionary Embedding for Graph Classification

Jun 18, 2023

Guangbu Liu, Tong Zhang, Xudong Wang, Wenting Zhao, Chuanwei Zhou, Zhen Cui

Figure 1 for Structure-Sensitive Graph Dictionary Embedding for Graph Classification

Figure 2 for Structure-Sensitive Graph Dictionary Embedding for Graph Classification

Figure 3 for Structure-Sensitive Graph Dictionary Embedding for Graph Classification

Figure 4 for Structure-Sensitive Graph Dictionary Embedding for Graph Classification

Abstract:Graph structure expression plays a vital role in distinguishing various graphs. In this work, we propose a Structure-Sensitive Graph Dictionary Embedding (SS-GDE) framework to transform input graphs into the embedding space of a graph dictionary for the graph classification task. Instead of a plain use of a base graph dictionary, we propose the variational graph dictionary adaptation (VGDA) to generate a personalized dictionary (named adapted graph dictionary) for catering to each input graph. In particular, for the adaptation, the Bernoulli sampling is introduced to adjust substructures of base graph keys according to each input, which increases the expression capacity of the base dictionary tremendously. To make cross-graph measurement sensitive as well as stable, multi-sensitivity Wasserstein encoding is proposed to produce the embeddings by designing multi-scale attention on optimal transport. To optimize the framework, we introduce mutual information as the objective, which further deduces to variational inference of the adapted graph dictionary. We perform our SS-GDE on multiple datasets of graph classification, and the experimental results demonstrate the effectiveness and superiority over the state-of-the-art methods.

Via

Access Paper or Ask Questions

Customizing General-Purpose Foundation Models for Medical Report Generation

Jun 09, 2023

Bang Yang, Asif Raza, Yuexian Zou, Tong Zhang

Figure 1 for Customizing General-Purpose Foundation Models for Medical Report Generation

Figure 2 for Customizing General-Purpose Foundation Models for Medical Report Generation

Figure 3 for Customizing General-Purpose Foundation Models for Medical Report Generation

Figure 4 for Customizing General-Purpose Foundation Models for Medical Report Generation

Abstract:Medical caption prediction which can be regarded as a task of medical report generation (MRG), requires the automatic generation of coherent and accurate captions for the given medical images. However, the scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks capable of harnessing the potential artificial general intelligence power like large language models (LLMs). In this work, we propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs), in computer vision and natural language processing with a specific focus on medical report generation. Specifically, following BLIP-2, a state-of-the-art vision-language pre-training approach, we introduce our encoder-decoder-based MRG model. This model utilizes a lightweight query Transformer to connect two FMs: the giant vision Transformer EVA-ViT-g and a bilingual LLM trained to align with human intentions (referred to as ChatGLM-6B). Furthermore, we conduct ablative experiments on the trainable components of the model to identify the crucial factors for effective transfer learning. Our findings demonstrate that unfreezing EVA-ViT-g to learn medical image representations, followed by parameter-efficient training of ChatGLM-6B to capture the writing styles of medical reports, is essential for achieving optimal results. Our best attempt (PCLmed Team) achieved the 4th and the 2nd, respectively, out of 13 participating teams, based on the BERTScore and ROUGE-1 metrics, in the ImageCLEFmedical Caption 2023 Caption Prediction Task competition.

* 14 pages, 3 figures

Via

Access Paper or Ask Questions

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

Jun 08, 2023

Shizhe Diao, Tianyang Xu, Ruijia Xu, Jiawei Wang, Tong Zhang

Figure 1 for Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

Figure 2 for Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

Figure 3 for Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

Figure 4 for Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

Abstract:Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain. Although continued pre-training on a large domain-specific corpus is effective, it is costly to tune all the parameters on the domain. In this paper, we investigate whether we can adapt PLMs both effectively and efficiently by only tuning a few parameters. Specifically, we decouple the feed-forward networks (FFNs) of the Transformer architecture into two parts: the original pre-trained FFNs to maintain the old-domain knowledge and our novel domain-specific adapters to inject domain-specific knowledge in parallel. Then we adopt a mixture-of-adapters gate to fuse the knowledge from different domain adapters dynamically. Our proposed Mixture-of-Domain-Adapters (MixDA) employs a two-stage adapter-tuning strategy that leverages both unlabeled data and labeled data to help the domain adaptation: i) domain-specific adapter on unlabeled data; followed by ii) the task-specific adapter on labeled data. MixDA can be seamlessly plugged into the pretraining-finetuning paradigm and our experiments demonstrate that MixDA achieves superior performance on in-domain tasks (GLUE), out-of-domain tasks (ChemProt, RCT, IMDB, Amazon), and knowledge-intensive tasks (KILT). Further analyses demonstrate the reliability, scalability, and efficiency of our method. The code is available at https://github.com/Amano-Aki/Mixture-of-Domain-Adapters.

* ACL 2023

Via

Access Paper or Ask Questions