Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yongdong Zhang

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Mar 12, 2024

Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang

Figure 1 for DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Figure 2 for DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Figure 3 for DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Figure 4 for DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Abstract:The diffusion-based text-to-image model harbors immense potential in transferring reference style. However, current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this paper, we introduce DEADiff to address this issue using the following two strategies: 1) a mechanism to decouple the style and semantics of reference images. The decoupled feature representations are first extracted by Q-Formers which are instructed by different text descriptions. Then they are injected into mutually exclusive subsets of cross-attention layers for better disentanglement. 2) A non-reconstructive learning method. The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics. We show that DEADiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image, as demonstrated both quantitatively and qualitatively. Our project page is https://tianhao-qi.github.io/DEADiff/.

* Accepted by CVPR 2024

Via

Access Paper or Ask Questions

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Mar 01, 2024

Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang

Figure 1 for RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Figure 2 for RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Figure 3 for RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Figure 4 for RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Abstract:Text-to-image customization, which aims to synthesize text-driven images for the given subjects, has recently revolutionized content creation. Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text. However, the inherent entangled influence scope of pseudo-words with the given text results in a dual-optimum paradox, i.e., the similarity of the given subjects and the controllability of the given text could not be optimal simultaneously. We present RealCustom that, for the first time, disentangles similarity from controllability by precisely limiting subject influence to relevant parts only, achieved by gradually narrowing real text word from its general connotation to the specific subject and using its cross-attention to distinguish relevance. Specifically, RealCustom introduces a novel "train-inference" decoupled framework: (1) during training, RealCustom learns general alignment between visual conditions to original textual conditions by a novel adaptive scoring module to adaptively modulate influence quantity; (2) during inference, a novel adaptive mask guidance strategy is proposed to iteratively update the influence scope and influence quantity of the given subjects to gradually narrow the generation of the real text word. Comprehensive experiments demonstrate the superior real-time customization ability of RealCustom in the open domain, achieving both unprecedented similarity of the given subjects and controllability of the given text for the first time. The project page is https://corleone-huang.github.io/realcustom/.

* Accepted by CVPR2024

Via

Access Paper or Ask Questions

Alleviating Structural Distribution Shift in Graph Anomaly Detection

Jan 25, 2024

Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, Yongdong Zhang

Figure 1 for Alleviating Structural Distribution Shift in Graph Anomaly Detection

Figure 2 for Alleviating Structural Distribution Shift in Graph Anomaly Detection

Figure 3 for Alleviating Structural Distribution Shift in Graph Anomaly Detection

Figure 4 for Alleviating Structural Distribution Shift in Graph Anomaly Detection

Abstract:Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes -- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes. Furthermore, due to various time factors and the annotation preferences of human experts, the heterophily and homophily can change across training and testing data, which is called structural distribution shift (SDS) in this paper. The mainstream methods are built on graph neural networks (GNNs), benefiting the classification of normals from aggregating homophilous neighbors, yet ignoring the SDS issue for anomalies and suffering from poor generalization. This work solves the problem from a feature view. We observe that the degree of SDS varies between anomalies and normal nodes. Hence to address the issue, the key lies in resisting high heterophily for anomalies meanwhile benefiting the learning of normals from homophily. We tease out the anomaly features on which we constrain to mitigate the effect of heterophilous neighbors and make them invariant. We term our proposed framework as Graph Decomposition Network (GDN). Extensive experiments are conducted on two benchmark datasets, and the proposed framework achieves a remarkable performance boost in GAD, especially in an SDS environment where anomalies have largely different structural distribution across training and testing environments. Codes are open-sourced in https://github.com/blacksingular/wsdm_GDN.

* Accepted to WSDM 2023

Via

Access Paper or Ask Questions

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

Jan 04, 2024

Yulin Li, Tianzhu Zhang, Yongdong Zhang

Abstract:Visible-infrared person re-identification (VI-ReID) is challenging due to the significant cross-modality discrepancies between visible and infrared images. While existing methods have focused on designing complex network architectures or using metric learning constraints to learn modality-invariant features, they often overlook which specific component of the image causes the modality discrepancy problem. In this paper, we first reveal that the difference in the amplitude component of visible and infrared images is the primary factor that causes the modality discrepancy and further propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective. Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) module and the Phrase-Preserving Normalization (PPNorm) module, to enhance the modality-invariant amplitude component and suppress the modality-specific component at both the image- and feature-levels. Extensive experimental results on two standard benchmarks, SYSU-MM01 and RegDB, demonstrate the superior performance of our FDMNet against state-of-the-art methods.

* Under review

Via

Access Paper or Ask Questions

Grammatical Error Correction via Mixed-Grained Weighted Training

Nov 23, 2023

Jiahao Li, Quan Wang, Chiwei Zhu, Zhendong Mao, Yongdong Zhang

Figure 1 for Grammatical Error Correction via Mixed-Grained Weighted Training

Figure 2 for Grammatical Error Correction via Mixed-Grained Weighted Training

Figure 3 for Grammatical Error Correction via Mixed-Grained Weighted Training

Figure 4 for Grammatical Error Correction via Mixed-Grained Weighted Training

Abstract:The task of Grammatical Error Correction (GEC) aims to automatically correct grammatical errors in natural texts. Almost all previous works treat annotated training data equally, but inherent discrepancies in data are neglected. In this paper, the inherent discrepancies are manifested in two aspects, namely, accuracy of data annotation and diversity of potential annotations. To this end, we propose MainGEC, which designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation, respectively, and then conducts mixed-grained weighted training to improve the training effect for GEC. Empirical evaluation shows that whether in the Seq2Seq or Seq2Edit manner, MainGEC achieves consistent and significant performance improvements on two benchmark datasets, demonstrating the effectiveness and superiority of the mixed-grained weighted training. Further ablation experiments verify the effectiveness of designed weights of both granularities in MainGEC.

* EMNLP2023 Findings

Via

Access Paper or Ask Questions

ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences

Nov 23, 2023

Yuanhe Tian, Ruyi Gan, Yan Song, Jiaxing Zhang, Yongdong Zhang

Abstract:Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. Another engineering barrier that prevents current medical LLM from better text processing ability is their restricted context length (e.g., 2,048 tokens), making it hard for the LLMs to process long context, which is frequently required in the medical domain. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, with enlarged context length to 4,096 tokens and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on real-world tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.

* 17 pages, 3 figures

Via

Access Paper or Ask Questions

On the Calibration of Large Language Models and Alignment

Nov 22, 2023

Chiwei Zhu, Benfeng Xu, Quan Wang, Yongdong Zhang, Zhendong Mao

Abstract:As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of deep models, serves as a crucial tool for assessing and improving their reliability. However, such investigation has been comparatively underexplored. In this work, we conduct a systematic examination of the calibration of aligned language models throughout the entire construction process, including pretraining and alignment training. At each stage, we investigate how different training settings, such as parameter scales and training data, affect model calibration. To thoroughly assess model calibration, we evaluate models on three most concerned aspects: generation, factuality and understanding. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.

* to be published in findings of EMNLP-2023

Via

Access Paper or Ask Questions

Causality is all you need

Nov 21, 2023

Ning Xu, Yifei Gao, Hongshuo Tian, Yongdong Zhang, An-An Liu

Abstract:In the fundamental statistics course, students are taught to remember the well-known saying: "Correlation is not Causation". Till now, statistics (i.e., correlation) have developed various successful frameworks, such as Transformer and Pre-training large-scale models, which have stacked multiple parallel self-attention blocks to imitate a wide range of tasks. However, in the causation community, how to build an integrated causal framework still remains an untouched domain despite its excellent intervention capabilities. In this paper, we propose the Causal Graph Routing (CGR) framework, an integrated causal scheme relying entirely on the intervention mechanisms to reveal the cause-effect forces hidden in data. Specifically, CGR is composed of a stack of causal layers. Each layer includes a set of parallel deconfounding blocks from different causal graphs. We combine these blocks via the concept of the proposed sufficient cause, which allows the model to dynamically select the suitable deconfounding methods in each layer. CGR is implemented as the stacked networks, integrating no confounder, back-door adjustment, front-door adjustment, and probability of sufficient cause. We evaluate this framework on two classical tasks of CV and NLP. Experiments show CGR can surpass the current state-of-the-art methods on both Visual Question Answer and Long Document Classification tasks. In particular, CGR has great potential in building the "causal" pre-training large-scale model that effectively generalizes to diverse tasks. It will improve the machines' comprehension of causal relationships within a broader semantic space.

Via

Access Paper or Ask Questions

Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

Nov 02, 2023

Tianqi Zhong, Quan Wang, Jingxuan Han, Yongdong Zhang, Zhendong Mao

Figure 1 for Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

Figure 2 for Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

Figure 3 for Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

Figure 4 for Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

Abstract:Controllable text generation (CTG) aims to generate text with desired attributes, and decoding-time-based methods have shown promising performance on this task. However, in this paper, we identify the phenomenon of Attribute Collapse for the first time. It causes the fluency of generated text to rapidly decrease when the control strength exceeds a critical value, rendering the text completely unusable. This limitation hinders the effectiveness of decoding methods in achieving high levels of controllability. To address this problem, we propose a novel lightweight decoding framework named Air-Decoding. Its main idea is reconstructing the attribute distributions to balance the weights between attribute words and non-attribute words to generate more fluent text. Specifically, we train prefixes by prefix-tuning to obtain attribute distributions. Then we design a novel attribute distribution reconstruction method to balance the obtained distributions and use the reconstructed distributions to guide language models for generation, effectively avoiding the issue of Attribute Collapse. Experiments on multiple CTG tasks prove that our method achieves a new state-of-the-art control performance.

* Accepted as an EMNLP 2023 main paper

Via

Access Paper or Ask Questions

Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation

Oct 22, 2023

Haoyang Liu, Yufei Kuang, Jie Wang, Xijun Li, Yongdong Zhang, Feng Wu

Figure 1 for Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation

Figure 2 for Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation

Figure 3 for Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation

Figure 4 for Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation

Abstract:Machine learning has been successfully applied to improve the efficiency of Mixed-Integer Linear Programming (MILP) solvers. However, the learning-based solvers often suffer from severe performance degradation on unseen MILP instances -- especially on large-scale instances from a perturbed environment -- due to the limited diversity of training distributions. To tackle this problem, we propose a novel approach, which is called Adversarial Instance Augmentation and does not require to know the problem type for new instance generation, to promote data diversity for learning-based branching modules in the branch-and-bound (B&B) Solvers (AdaSolver). We use the bipartite graph representations for MILP instances and obtain various perturbed instances to regularize the solver by augmenting the graph structures with a learned augmentation policy. The major technical contribution of AdaSolver is that we formulate the non-differentiable instance augmentation as a contextual bandit problem and adversarially train the learning-based solver and augmentation policy, enabling efficient gradient-based training of the augmentation policy. To the best of our knowledge, AdaSolver is the first general and effective framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based (RL-based) B&B solvers. Extensive experiments demonstrate that by producing various augmented instances, AdaSolver leads to a remarkable efficiency improvement across various distributions.

Via

Access Paper or Ask Questions