Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dacheng Tao

Boosting Factorization Machines via Saliency-Guided Mixup

Jun 17, 2022
Chenwang Wu, Defu Lian, Yong Ge, Min Zhou, Enhong Chen, Dacheng Tao

Figure 1 for Boosting Factorization Machines via Saliency-Guided Mixup

Figure 2 for Boosting Factorization Machines via Saliency-Guided Mixup

Figure 3 for Boosting Factorization Machines via Saliency-Guided Mixup

Figure 4 for Boosting Factorization Machines via Saliency-Guided Mixup

Factorization machines (FMs) are widely used in recommender systems due to their adaptability and ability to learn from sparse data. However, for the ubiquitous non-interactive features in sparse data, existing FMs can only estimate the parameters corresponding to these features via the inner product of their embeddings. Undeniably, they cannot learn the direct interactions of these features, which limits the model's expressive power. To this end, we first present MixFM, inspired by Mixup, to generate auxiliary training data to boost FMs. Unlike existing augmentation strategies that require labor costs and expertise to collect additional information such as position and fields, these extra data generated by MixFM only by the convex combination of the raw ones without any professional knowledge support. More importantly, if the parent samples to be mixed have non-interactive features, MixFM will establish their direct interactions. Second, considering that MixFM may generate redundant or even detrimental instances, we further put forward a novel Factorization Machine powered by Saliency-guided Mixup (denoted as SMFM). Guided by the customized saliency, SMFM can generate more informative neighbor data. Through theoretical analysis, we prove that the proposed methods minimize the upper bound of the generalization error, which hold a beneficial effect on enhancing FMs. Significantly, we give the first generalization bound of FM, implying the generalization requires more data and a smaller embedding size under the sufficient representation capability. Finally, extensive experiments on five datasets confirm that our approaches are superior to baselines. Besides, the results show that "poisoning" mixed data is likewise beneficial to the FM variants.

Via

Access Paper or Ask Questions

A Survey on Gradient Inversion: Attacks, Defenses and Future Directions

Jun 15, 2022
Rui Zhang, Song Guo, Junxiao Wang, Xin Xie, Dacheng Tao

Figure 1 for A Survey on Gradient Inversion: Attacks, Defenses and Future Directions

Figure 2 for A Survey on Gradient Inversion: Attacks, Defenses and Future Directions

Figure 3 for A Survey on Gradient Inversion: Attacks, Defenses and Future Directions

Recent studies have shown that the training samples can be recovered from gradients, which are called Gradient Inversion (GradInv) attacks. However, there remains a lack of extensive surveys covering recent advances and thorough analysis of this issue. In this paper, we present a comprehensive survey on GradInv, aiming to summarize the cutting-edge research and broaden the horizons for different domains. Firstly, we propose a taxonomy of GradInv attacks by characterizing existing attacks into two paradigms: iteration- and recursion-based attacks. In particular, we dig out some critical ingredients from the iteration-based attacks, including data initialization, model training and gradient matching. Second, we summarize emerging defense strategies against GradInv attacks. We find these approaches focus on three perspectives covering data obscuration, model improvement and gradient protection. Finally, we discuss some promising directions and open problems for further research.

* Accepted by IJCAI-ECAI 2022

Via

Access Paper or Ask Questions

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

Jun 12, 2022
Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, Dacheng Tao

Figure 1 for APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

Figure 2 for APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

Figure 3 for APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

Figure 4 for APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

Animal pose estimation and tracking (APT) is a fundamental task for detecting and tracking animal keypoints from a sequence of video frames. Previous animal-related datasets focus either on animal tracking or single-frame animal pose estimation, and never on both aspects. The lack of APT datasets hinders the development and evaluation of video-based animal pose estimation and tracking methods, limiting real-world applications, e.g., understanding animal behavior in wildlife conservation. To fill this gap, we make the first step and propose APT-36K, i.e., the first large-scale benchmark for animal pose estimation and tracking. Specifically, APT-36K consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total. After manual annotation and careful double-check, high-quality keypoint and tracking annotations are provided for all the animal instances. Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking. Based on the experimental results, we gain some empirical insights and show that APT-36K provides a valuable animal pose estimation and tracking benchmark, offering new challenges and opportunities for future research. The code and dataset will be made publicly available at https://github.com/pandorgan/APT-36K.

Via

Access Paper or Ask Questions

Toward Real-world Single Image Deraining: A New Benchmark and Beyond

Jun 11, 2022
Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, Dacheng Tao

Figure 1 for Toward Real-world Single Image Deraining: A New Benchmark and Beyond

Figure 2 for Toward Real-world Single Image Deraining: A New Benchmark and Beyond

Figure 3 for Toward Real-world Single Image Deraining: A New Benchmark and Beyond

Figure 4 for Toward Real-world Single Image Deraining: A New Benchmark and Beyond

Single image deraining (SID) in real scenarios attracts increasing attention in recent years. Due to the difficulty in obtaining real-world rainy/clean image pairs, previous real datasets suffer from low-resolution images, homogeneous rain streaks, limited background variation, and even misalignment of image pairs, resulting in incomprehensive evaluation of SID methods. To address these issues, we establish a new high-quality dataset named RealRain-1k, consisting of $1,120$ high-resolution paired clean and rainy images with low- and high-density rain streaks, respectively. Images in RealRain-1k are automatically generated from a large number of real-world rainy video clips through a simple yet effective rain density-controllable filtering method, and have good properties of high image resolution, background diversity, rain streaks variety, and strict spatial alignment. RealRain-1k also provides abundant rain streak layers as a byproduct, enabling us to build a large-scale synthetic dataset named SynRain-13k by pasting the rain streak layers on abundant natural images. Based on them and existing datasets, we benchmark more than 10 representative SID methods on three tracks: (1) fully supervised learning on RealRain-1k, (2) domain generalization to real datasets, and (3) syn-to-real transfer learning. The experimental results (1) show the difference of representative methods in image restoration performance and model complexity, (2) validate the significance of the proposed datasets for model generalization, and (3) provide useful insights on the superiority of learning from diverse domains and shed lights on the future research on real-world SID. The datasets will be released at https://github.com/hiker-lw/RealRain-1k

Via

Access Paper or Ask Questions

Referring Image Matting

Jun 10, 2022
Jizhizi Li, Jing Zhang, Dacheng Tao

Image matting refers to extracting the accurate foregrounds in the image. Current automatic methods tend to extract all the salient objects in the image indiscriminately. In this paper, we propose a new task named Referring Image Matting (RIM), referring to extracting the meticulous alpha matte of the specific object that can best match the given natural language description. However, prevalent visual grounding methods are all limited to the segmentation level, probably due to the lack of high-quality datasets for RIM. To fill the gap, we establish the first large-scale challenging dataset RefMatte by designing a comprehensive image composition and expression generation engine to produce synthetic images on top of current public high-quality matting foregrounds with flexible logics and re-labelled diverse attributes. RefMatte consists of 230 object categories, 47,500 images, 118,749 expression-region entities, and 474,996 expressions, which can be further extended easily in the future. Besides this, we also construct a real-world test set with manually generated phrase annotations consisting of 100 natural images to further evaluate the generalization of RIM models. We first define the task of RIM in two settings, i.e., prompt-based and expression-based, and then benchmark several representative methods together with specific model designs for image matting. The results provide empirical insights into the limitations of existing methods as well as possible solutions. We believe the new task RIM along with the RefMatte dataset will open new research directions in this area and facilitate future studies. The dataset and code will be made publicly available at https://github.com/JizhiziLi/RIM.

* The dataset and code are available at https://github.com/JizhiziLi/RIM

Via

Access Paper or Ask Questions

A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

Jun 09, 2022
Jixian Guo, Mingming Gong, Dacheng Tao

Figure 1 for A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

Figure 2 for A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

Figure 3 for A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

Figure 4 for A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

The generalization of model-based reinforcement learning (MBRL) methods to environments with unseen transition dynamics is an important yet challenging problem. Existing methods try to extract environment-specified information $Z$ from past transition segments to make the dynamics prediction model generalizable to different dynamics. However, because environments are not labelled, the extracted information inevitably contains redundant information unrelated to the dynamics in transition segments and thus fails to maintain a crucial property of $Z$: $Z$ should be similar in the same environment and dissimilar in different ones. As a result, the learned dynamics prediction function will deviate from the true one, which undermines the generalization ability. To tackle this problem, we introduce an interventional prediction module to estimate the probability of two estimated $\hat{z}_i, \hat{z}_j$ belonging to the same environment. Furthermore, by utilizing the $Z$'s invariance within a single environment, a relational head is proposed to enforce the similarity between $\hat{{Z}}$ from the same environment. As a result, the redundant information will be reduced in $\hat{Z}$. We empirically show that $\hat{{Z}}$ estimated by our method enjoy less redundant information than previous methods, and such $\hat{{Z}}$ can significantly reduce dynamics prediction errors and improve the performance of model-based RL methods on zero-shot new environments with unseen dynamics. The codes of this method are available at \url{https://github.com/CR-Gjx/RIA}.

* ICLR2022 accepted paper

Via

Access Paper or Ask Questions

Recent Advances for Quantum Neural Networks in Generative Learning

Jun 07, 2022
Jinkai Tian, Xiaoyu Sun, Yuxuan Du, Shanshan Zhao, Qing Liu, Kaining Zhang, Wei Yi, Wanrong Huang, Chaoyue Wang, Xingyao Wu, Min-Hsiu Hsieh, Tongliang Liu, Wenjing Yang, Dacheng Tao

Figure 1 for Recent Advances for Quantum Neural Networks in Generative Learning

Figure 2 for Recent Advances for Quantum Neural Networks in Generative Learning

Figure 3 for Recent Advances for Quantum Neural Networks in Generative Learning

Figure 4 for Recent Advances for Quantum Neural Networks in Generative Learning

Quantum computers are next-generation devices that hold promise to perform calculations beyond the reach of classical computers. A leading method towards achieving this goal is through quantum machine learning, especially quantum generative learning. Due to the intrinsic probabilistic nature of quantum mechanics, it is reasonable to postulate that quantum generative learning models (QGLMs) may surpass their classical counterparts. As such, QGLMs are receiving growing attention from the quantum physics and computer science communities, where various QGLMs that can be efficiently implemented on near-term quantum machines with potential computational advantages are proposed. In this paper, we review the current progress of QGLMs from the perspective of machine learning. Particularly, we interpret these QGLMs, covering quantum circuit born machines, quantum generative adversarial networks, quantum Boltzmann machines, and quantum autoencoders, as the quantum extension of classical generative learning models. In this context, we explore their intrinsic relation and their fundamental differences. We further summarize the potential applications of QGLMs in both conventional machine learning tasks and quantum physics. Last, we discuss the challenges and further research directions for QGLMs.

* The first two authors contributed equally to this work

Via

Access Paper or Ask Questions

Understanding deep learning via decision boundary

Jun 03, 2022
Shiye Lei, Fengxiang He, Yancheng Yuan, Dacheng Tao

Figure 1 for Understanding deep learning via decision boundary

Figure 2 for Understanding deep learning via decision boundary

Figure 3 for Understanding deep learning via decision boundary

Figure 4 for Understanding deep learning via decision boundary

This paper discovers that the neural network with lower decision boundary (DB) variability has better generalizability. Two new notions, algorithm DB variability and $(\epsilon, \eta)$-data DB variability, are proposed to measure the decision boundary variability from the algorithm and data perspectives. Extensive experiments show significant negative correlations between the decision boundary variability and the generalizability. From the theoretical view, two lower bounds based on algorithm DB variability are proposed and do not explicitly depend on the sample size. We also prove an upper bound of order $\mathcal{O}\left(\frac{1}{\sqrt{m}}+\epsilon+\eta\log\frac{1}{\eta}\right)$ based on data DB variability. The bound is convenient to estimate without the requirement of labels, and does not explicitly depend on the network size which is usually prohibitively large in deep learning.

* 23 pages, 7 figures

Via

Access Paper or Ask Questions

Modeling Image Composition for Complex Scene Generation

Jun 02, 2022
Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, Dacheng Tao

Figure 1 for Modeling Image Composition for Complex Scene Generation

Figure 2 for Modeling Image Composition for Complex Scene Generation

Figure 3 for Modeling Image Composition for Complex Scene Generation

Figure 4 for Modeling Image Composition for Complex Scene Generation

We present a method that achieves state-of-the-art results on challenging (few-shot) layout-to-image generation tasks by accurately modeling textures, structures and relationships contained in a complex scene. After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch. Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level&patch-level and object-level&patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training. Furthermore, the proposed TwFA largely increases the data efficiency during training, therefore we propose the first few-shot complex scene generation strategy based on the well-trained TwFA. Comprehensive experiments show the superiority of our method, which significantly increases both quantitative metrics and qualitative visual realism with respect to state-of-the-art CNN-based and transformer-based methods. Code is available at https://github.com/JohnDreamer/TwFA.

* CVPR 2022

Via

Access Paper or Ask Questions

DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

Jun 01, 2022
Rong Dai, Li Shen, Fengxiang He, Xinmei Tian, Dacheng Tao

Figure 1 for DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

Figure 2 for DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

Figure 3 for DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

Figure 4 for DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training

Personalized federated learning is proposed to handle the data heterogeneity problem amongst clients by learning dedicated tailored local models for each user. However, existing works are often built in a centralized way, leading to high communication pressure and high vulnerability when a failure or an attack on the central server occurs. In this work, we propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named Dis-PFL, which employs personalized sparse masks to customize sparse local models on the edge. To further save the communication and computation cost, we propose a decentralized sparse training technique, which means that each local model in Dis-PFL only maintains a fixed number of active parameters throughout the whole local training and peer-to-peer communication process. Comprehensive experiments demonstrate that Dis-PFL significantly saves the communication bottleneck for the busiest node among all clients and, at the same time, achieves higher model accuracy with less computation cost and communication rounds. Furthermore, we demonstrate that our method can easily adapt to heterogeneous local clients with varying computation complexities and achieves better personalized performances.

* To be published in ICML2022

Via

Access Paper or Ask Questions