Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dacheng Tao

and Other Contributors

A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

Sep 27, 2024

Yan Sun, Li Shen, Dacheng Tao

Figure 1 for A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

Figure 2 for A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

Figure 3 for A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

Figure 4 for A-FedPD: Aligning Dual-Drift is All Federated Primal-Dual Learning Needs

Abstract:As a popular paradigm for juggling data privacy and collaborative training, federated learning (FL) is flourishing to distributively process the large scale of heterogeneous datasets on edged clients. Due to bandwidth limitations and security considerations, it ingeniously splits the original problem into multiple subproblems to be solved in parallel, which empowers primal dual solutions to great application values in FL. In this paper, we review the recent development of classical federated primal dual methods and point out a serious common defect of such methods in non-convex scenarios, which we say is a "dual drift" caused by dual hysteresis of those longstanding inactive clients under partial participation training. To further address this problem, we propose a novel Aligned Federated Primal Dual (A-FedPD) method, which constructs virtual dual updates to align global consensus and local dual variables for those protracted unparticipated local clients. Meanwhile, we provide a comprehensive analysis of the optimization and generalization efficiency for the A-FedPD method on smooth non-convex objectives, which confirms its high efficiency and practicality. Extensive experiments are conducted on several classical FL setups to validate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

MG-Net: Learn to Customize QAOA with Circuit Depth Awareness

Sep 27, 2024

Yang Qian, Xinbiao Wang, Yuxuan Du, Yong Luo, Dacheng Tao

Figure 1 for MG-Net: Learn to Customize QAOA with Circuit Depth Awareness

Figure 2 for MG-Net: Learn to Customize QAOA with Circuit Depth Awareness

Figure 3 for MG-Net: Learn to Customize QAOA with Circuit Depth Awareness

Figure 4 for MG-Net: Learn to Customize QAOA with Circuit Depth Awareness

Abstract:Quantum Approximate Optimization Algorithm (QAOA) and its variants exhibit immense potential in tackling combinatorial optimization challenges. However, their practical realization confronts a dilemma: the requisite circuit depth for satisfactory performance is problem-specific and often exceeds the maximum capability of current quantum devices. To address this dilemma, here we first analyze the convergence behavior of QAOA, uncovering the origins of this dilemma and elucidating the intricate relationship between the employed mixer Hamiltonian, the specific problem at hand, and the permissible maximum circuit depth. Harnessing this understanding, we introduce the Mixer Generator Network (MG-Net), a unified deep learning framework adept at dynamically formulating optimal mixer Hamiltonians tailored to distinct tasks and circuit depths. Systematic simulations, encompassing Ising models and weighted Max-Cut instances with up to 64 qubits, substantiate our theoretical findings, highlighting MG-Net's superior performance in terms of both approximation ratio and efficiency.

* 29 pages, 16 figures

Via

Access Paper or Ask Questions

MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators

Sep 22, 2024

Qingyu Lu, Liang Ding, Kanjian Zhang, Jinxia Zhang, Dacheng Tao

Abstract:Large Language Models (LLMs) have shown significant potential as judges for Machine Translation (MT) quality assessment, providing both scores and fine-grained feedback. Although approaches such as GEMBA-MQM has shown SOTA performance on reference-free evaluation, the predicted errors do not align well with those annotated by human, limiting their interpretability as feedback signals. To enhance the quality of error annotations predicted by LLM evaluators, we introduce a universal and training-free framework, $\textbf{MQM-APE}$, based on the idea of filtering out non-impactful errors by Automatically Post-Editing (APE) the original translation based on each error, leaving only those errors that contribute to quality improvement. Specifically, we prompt the LLM to act as 1) $\textit{evaluator}$ to provide error annotations, 2) $\textit{post-editor}$ to determine whether errors impact quality improvement and 3) $\textit{pairwise quality verifier}$ as the error filter. Experiments show that our approach consistently improves both the reliability and quality of error spans against GEMBA-MQM, across eight LLMs in both high- and low-resource languages. Orthogonal to trained approaches, MQM-APE complements translation-specific evaluators such as Tower, highlighting its broad applicability. Further analysis confirm the effectiveness of each module and offer valuable insights into evaluator design and LLMs selection. The code will be released to facilitate the community.

* Under Review

Via

Access Paper or Ask Questions

Distilling Channels for Efficient Deep Tracking

Sep 18, 2024

Shiming Ge, Zhao Luo, Chunhui Zhang, Yingying Hua, Dacheng Tao

Figure 1 for Distilling Channels for Efficient Deep Tracking

Figure 2 for Distilling Channels for Efficient Deep Tracking

Figure 3 for Distilling Channels for Efficient Deep Tracking

Figure 4 for Distilling Channels for Efficient Deep Tracking

Abstract:Deep trackers have proven success in visual tracking. Typically, these trackers employ optimally pre-trained deep networks to represent all diverse objects with multi-channel features from some fixed layers. The deep networks employed are usually trained to extract rich knowledge from massive data used in object classification and so they are capable to represent generic objects very well. However, these networks are too complex to represent a specific moving object, leading to poor generalization as well as high computational and memory costs. This paper presents a novel and general framework termed channel distillation to facilitate deep trackers. To validate the effectiveness of channel distillation, we take discriminative correlation filter (DCF) and ECO for example. We demonstrate that an integrated formulation can turn feature compression, response map generation, and model update into a unified energy minimization problem to adaptively select informative feature channels that improve the efficacy of tracking moving objects on the fly. Channel distillation can accurately extract good channels, alleviating the influence of noisy channels and generally reducing the number of channels, as well as adaptively generalizing to different channels and networks. The resulting deep tracker is accurate, fast, and has low memory requirements. Extensive experimental evaluations on popular benchmarks clearly demonstrate the effectiveness and generalizability of our framework.

* Published by IEEE TIP 2020

Via

Access Paper or Ask Questions

$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding

Sep 09, 2024

Shuai Wang, Liang Ding, Li Shen, Yong Luo, Zheng He, Wei Yu, Dacheng Tao

$Figure 1 for $\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding$

$Figure 2 for $\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding$

$Figure 3 for $\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding$

$Figure 4 for $\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding$

Abstract:Large language models (LLMs) have shown remarkable capabilities in code generation. However, the effects of hallucinations (e.g., output noise) make it particularly challenging for LLMs to generate high-quality code in one pass. In this work, we propose a simple and effective \textbf{u}ncertainty-aware \textbf{s}elective \textbf{c}ontrastive \textbf{d}ecoding ($\mathbb{USCD}$) mechanism to improve the quality of one-pass code generation in LLMs and reduce the impact of output noise. To be specific, we first elaborately designed a negative prompt (namely lame prompt) to output noise by removing input-output examples from the standard few-shot prompt. Our preliminary study shows that the Jensen-Shannon divergence (JS divergence) between token distribution uncertainty and the output noise is relatively low (approximately $0.25$), indicating their high relevance. Then, we selectively eliminate output noise induced by lame prompts based on the uncertainty of the prediction distribution from the standard prompt. Notably, our proposed plug-and-play mechanism is an inference-only method, enjoying appealing flexibility. Extensive experiments on widely used benchmarks, e.g., HumanEval, MBPP, and MultiPL-E, upon several LLMs (i.e., Inocder-6b, CodeLlama-7b, WizardCoder-15b, StarCoder, and Llama2-7b), demonstrate that our proposed USCD significantly improves one-pass code generation, with an average \textit{pass@$1$} scores increase of 16.59\%. We will release code and data on GitHub.

* 13pages,8 figures

Via

Access Paper or Ask Questions

Joint Input and Output Coordination for Class-Incremental Learning

Sep 09, 2024

Shuai Wang, Yibing Zhan, Yong Luo, Han Hu, Wei Yu, Yonggang Wen, Dacheng Tao

Figure 1 for Joint Input and Output Coordination for Class-Incremental Learning

Figure 2 for Joint Input and Output Coordination for Class-Incremental Learning

Figure 3 for Joint Input and Output Coordination for Class-Incremental Learning

Figure 4 for Joint Input and Output Coordination for Class-Incremental Learning

Abstract:Incremental learning is nontrivial due to severe catastrophic forgetting. Although storing a small amount of data on old tasks during incremental learning is a feasible solution, current strategies still do not 1) adequately address the class bias problem, and 2) alleviate the mutual interference between new and old tasks, and 3) consider the problem of class bias within tasks. This motivates us to propose a joint input and output coordination (JIOC) mechanism to address these issues. This mechanism assigns different weights to different categories of data according to the gradient of the output score, and uses knowledge distillation (KD) to reduce the mutual interference between the outputs of old and new tasks. The proposed mechanism is general and flexible, and can be incorporated into different incremental learning approaches that use memory storage. Extensive experiments show that our mechanism can significantly improve their performance.

* 11 pages, 4 figues. Accepted by IJCAI 2024

Via

Access Paper or Ask Questions

Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Sep 04, 2024

Jifeng Hu, Li Shen, Sili Huang, Zhejian Yang, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

Figure 1 for Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Figure 2 for Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Figure 3 for Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Figure 4 for Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

Abstract:Artificial neural networks, especially recent diffusion-based models, have shown remarkable superiority in gaming, control, and QA systems, where the training tasks' datasets are usually static. However, in real-world applications, such as robotic control of reinforcement learning (RL), the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge. In view of this, we propose a rehearsal-based continual diffusion model, called Continual Diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability). Specifically, we first construct an offline benchmark that contains 90 tasks from multiple domains. Then, we train the CoD on each task with sequential modeling and conditional generation for making decisions. Next, we preserve a small portion of previous datasets as the rehearsal buffer and replay it to retain the acquired knowledge. Extensive experiments on a series of tasks show CoD can achieve a promising plasticity-stability trade-off and outperform existing diffusion-based methods and other representative baselines on most tasks.

Via

Access Paper or Ask Questions

Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

Aug 29, 2024

Liyao Tang, Zhe Chen, Shanshan Zhao, Chaoyue Wang, Dacheng Tao

Figure 1 for Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

Figure 2 for Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

Figure 3 for Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

Figure 4 for Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment

Abstract:Label-efficient segmentation aims to perform effective segmentation on input data using only sparse and limited ground-truth labels for training. This topic is widely studied in 3D point cloud segmentation due to the difficulty of annotating point clouds densely, while it is also essential for cost-effective segmentation on 2D images. Until recently, pseudo-labels have been widely employed to facilitate training with limited ground-truth labels, and promising progress has been witnessed in both the 2D and 3D segmentation. However, existing pseudo-labeling approaches could suffer heavily from the noises and variations in unlabelled data, which would result in significant discrepancies between generated pseudo-labels and current model predictions during training. We analyze that this can further confuse and affect the model learning process, which shows to be a shared problem in label-efficient learning across both 2D and 3D modalities. To address this issue, we propose a novel learning strategy to regularize the pseudo-labels generated for training, thus effectively narrowing the gaps between pseudo-labels and model predictions. More specifically, our method introduces an Entropy Regularization loss and a Distribution Alignment loss for label-efficient learning, resulting in an ERDA learning strategy. Interestingly, by using KL distance to formulate the distribution alignment loss, ERDA reduces to a deceptively simple cross-entropy-based loss which optimizes both the pseudo-label generation module and the segmentation model simultaneously. In addition, we innovate in the pseudo-label generation to make our ERDA consistently effective across both 2D and 3D data modalities for segmentation. Enjoying simplicity and more modality-agnostic pseudo-label generation, our method has shown outstanding performance in fully utilizing all unlabeled data points for training across ...

* Extended version of arXiv:2305.15832; Code at https://github.com/LiyaoTang/ERDA

Via

Access Paper or Ask Questions

Convergent Differential Privacy Analysis for General Federated Learning: the f-DP Perspective

Aug 28, 2024

Yan Sun, Li Shen, Dacheng Tao

Figure 1 for Convergent Differential Privacy Analysis for General Federated Learning: the f-DP Perspective

Figure 2 for Convergent Differential Privacy Analysis for General Federated Learning: the f-DP Perspective

Figure 3 for Convergent Differential Privacy Analysis for General Federated Learning: the f-DP Perspective

Figure 4 for Convergent Differential Privacy Analysis for General Federated Learning: the f-DP Perspective

Abstract:Federated learning (FL) is an efficient collaborative training paradigm extensively developed with a focus on local privacy protection, and differential privacy (DP) is a classical approach to capture and ensure the reliability of local privacy. The powerful cooperation of FL and DP provides a promising learning framework for large-scale private clients, juggling both privacy securing and trustworthy learning. As the predominant algorithm of DP, the noisy perturbation has been widely studied and incorporated into various federated algorithms, theoretically proven to offer significant privacy protections. However, existing analyses in noisy FL-DP mostly rely on the composition theorem and cannot tightly quantify the privacy leakage challenges, which is nearly tight for small numbers of communication rounds but yields an arbitrarily loose and divergent bound under the large communication rounds. This implies a counterintuitive judgment, suggesting that FL may not provide adequate privacy protection during long-term training. To further investigate the convergent privacy and reliability of the FL-DP framework, in this paper, we comprehensively evaluate the worst privacy of two classical methods under the non-convex and smooth objectives based on the f-DP analysis, i.e. Noisy-FedAvg and Noisy-FedProx methods. With the aid of the shifted-interpolation technique, we successfully prove that the worst privacy of the Noisy-FedAvg method achieves a tight convergent lower bound. Moreover, in the Noisy-FedProx method, with the regularization of the proxy term, the worst privacy has a stable constant lower bound. Our analysis further provides a solid theoretical foundation for the reliability of privacy protection in FL-DP. Meanwhile, our conclusions can also be losslessly converted to other classical DP analytical frameworks, e.g. $(\epsilon,\delta)$-DP and R$\acute{\text{e}}$nyi-DP (RDP).

Via

Access Paper or Ask Questions

Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Aug 28, 2024

Wenbin Wang, Liang Ding, Minyan Zeng, Xiabin Zhou, Li Shen, Yong Luo, Dacheng Tao

Figure 1 for Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Figure 2 for Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Figure 3 for Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Figure 4 for Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Abstract:Multimodal large language models (MLLMs) have experienced significant advancements recently, but still struggle to recognize and interpret intricate details in high-resolution (HR) images effectively. While state-of-the-art (SOTA) MLLMs claim to process images at 4K resolution, existing MLLM benchmarks only support up to 2K, leaving the capabilities of SOTA models on true HR images largely untested. Furthermore, existing methods for enhancing HR image perception in MLLMs rely on computationally expensive visual instruction tuning. To address these limitations, we introduce HR-Bench, the first deliberately designed benchmark to rigorously evaluate MLLM performance on 4K&8K images. Through extensive experiments, we demonstrate that while downsampling HR images leads to vision information loss, leveraging complementary modalities, e.g., text, can effectively compensate for this loss. Building upon this insight, we propose Divide, Conquer and Combine (DC$^2$), a novel training-free framework for enhancing MLLM perception of HR images. DC$^2$ follows a three-staged approach: 1) Divide: recursively partitioning the HR image into patches and merging similar patches to minimize computational overhead, 2) Conquer: leveraging the MLLM to generate accurate textual descriptions for each image patch, and 3) Combine: utilizing the generated text descriptions to enhance the MLLM's understanding of the overall HR image. Extensive experiments show that: 1) the SOTA MLLM achieves 63% accuracy, which is markedly lower than the 87% accuracy achieved by humans on HR-Bench; 2) our DC$^2$ brings consistent and significant improvements (a relative increase of +6% on HR-Bench and +8% on general multimodal benchmarks). The benchmark and code will be released to facilitate the multimodal R&D community.

Via

Access Paper or Ask Questions