Alert button
Picture for Yifei Zhang

Yifei Zhang

Alert button

RoCar: A Relationship Network-based Evaluation Method to Large Language Models

Jul 29, 2023
Ming Wang, Wenfang Wu, Chongyun Gao, Daling Wang, Shi Feng, Yifei Zhang

Figure 1 for RoCar: A Relationship Network-based Evaluation Method to Large Language Models
Figure 2 for RoCar: A Relationship Network-based Evaluation Method to Large Language Models
Figure 3 for RoCar: A Relationship Network-based Evaluation Method to Large Language Models
Figure 4 for RoCar: A Relationship Network-based Evaluation Method to Large Language Models

Large language models (LLMs) have received increasing attention. However, due to the complexity of its capabilities, how to rationally evaluate the capabilities of LLMs is still a task to be solved. We propose the RoCar method, which utilizes the defined basic schemas to randomly construct a task graph and generates natural language evaluation tasks based on the task graph to evaluate the reasoning and memory abilities of LLMs respectively. Due to the very large randomness of the task construction process, it is possible to ensure that none of the LLMs to be tested has directly learned the evaluation tasks, guaranteeing the fairness of the evaluation method.

Viaarxiv icon

Eliminating Lipschitz Singularities in Diffusion Models

Jun 20, 2023
Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng

Figure 1 for Eliminating Lipschitz Singularities in Diffusion Models
Figure 2 for Eliminating Lipschitz Singularities in Diffusion Models
Figure 3 for Eliminating Lipschitz Singularities in Diffusion Models
Figure 4 for Eliminating Lipschitz Singularities in Diffusion Models

Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models. However, the rationality of the diffusion process itself receives limited attention, leaving the question of whether the problem is well-posed and well-conditioned. In this paper, we uncover a vexing propensity of diffusion models: they frequently exhibit the infinite Lipschitz near the zero point of timesteps. This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations. We provide a comprehensive evaluation of the issue from both theoretical and empirical perspectives. To address this challenge, we propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz singularity of the diffusion model near zero. Remarkably, our technique yields a substantial improvement in performance, e.g., on the high-resolution FFHQ dataset ($256\times256$). Moreover, as a byproduct of our method, we manage to achieve a dramatic reduction in the Frechet Inception Distance of other acceleration methods relying on network Lipschitz, including DDIM and DPM-Solver, by over 33$\%$. We conduct extensive experiments on diverse datasets to validate our theory and method. Our work not only advances the understanding of the general diffusion process, but also provides insights for the design of diffusion models.

Viaarxiv icon

InRank: Incremental Low-Rank Learning

Jun 20, 2023
Jiawei Zhao, Yifei Zhang, Beidi Chen, Florian Schäfer, Anima Anandkumar

Figure 1 for InRank: Incremental Low-Rank Learning
Figure 2 for InRank: Incremental Low-Rank Learning
Figure 3 for InRank: Incremental Low-Rank Learning
Figure 4 for InRank: Incremental Low-Rank Learning

The theory of greedy low-rank learning (GLRL) aims to explain the impressive generalization capabilities of deep learning. It proves that stochastic gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training. However, there is a gap between theory and practice since GLRL requires an infinitesimal initialization of the weights, which is not practical due to the fact that it is a saddle point. In this work, we remove the assumption of infinitesimal initialization by focusing on cumulative weight updates. We prove the cumulative weight updates follow an incremental low-rank trajectory for arbitrary orthogonal initialization of weights in a three-layer linear network. Empirically, we demonstrate that our theory holds on a broad range of neural networks (e.g., transformers) and standard training algorithms (e.g., SGD, Adam). However, existing training algorithms do not exploit the low-rank property to improve computational efficiency as the networks are not parameterized in low-rank. To remedy this, we design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices while incrementally augmenting their ranks during training. We evaluate InRank on GPT-2, and our results indicate that InRank achieves comparable prediction performance as the full-rank counterpart while requiring at most 33% of the total ranks throughout training. We also propose an efficient version of InRank that achieves a reduction of 20% in total training time and 37% in memory usage when training GPT-medium on WikiText-103 from scratch.

Viaarxiv icon

Cones 2: Customizable Image Synthesis with Multiple Subjects

May 30, 2023
Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao

Figure 1 for Cones 2: Customizable Image Synthesis with Multiple Subjects
Figure 2 for Cones 2: Customizable Image Synthesis with Multiple Subjects
Figure 3 for Cones 2: Customizable Image Synthesis with Multiple Subjects
Figure 4 for Cones 2: Customizable Image Synthesis with Multiple Subjects

Synthesizing images with user-specified subjects has received growing attention due to its practical applications. Despite the recent success in single subject customization, existing algorithms suffer from high training cost and low success rate along with increased number of subjects. Towards controllable image synthesis with multiple subjects as the constraints, this work studies how to efficiently represent a particular subject as well as how to appropriately compose different subjects. We find that the text embedding regarding the subject token already serves as a simple yet effective representation that supports arbitrary combinations without any model tuning. Through learning a residual on top of the base embedding, we manage to robustly shift the raw subject to the customized subject given various text conditions. We then propose to employ layout, a very abstract and easy-to-obtain prior, as the spatial guidance for subject arrangement. By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image, significantly alleviating the interference across them. Both qualitative and quantitative experimental results demonstrate our superiority over state-of-the-art alternatives under a variety of settings for multi-subject customization.

Viaarxiv icon

Evaluate What You Can't Evaluate: Unassessable Generated Responses Quality

May 24, 2023
Yongkang Liu, Shi Feng, Daling Wang, Yifei Zhang, Hinrich Schütze

Figure 1 for Evaluate What You Can't Evaluate: Unassessable Generated Responses Quality
Figure 2 for Evaluate What You Can't Evaluate: Unassessable Generated Responses Quality
Figure 3 for Evaluate What You Can't Evaluate: Unassessable Generated Responses Quality
Figure 4 for Evaluate What You Can't Evaluate: Unassessable Generated Responses Quality

LLMs (large language models) such as ChatGPT have shown remarkable language understanding and generation capabilities. Although reference-free evaluators based on LLMs show better human alignment than traditional reference-based evaluators, there are many challenges in using reference-free evaluators based on LLMs. Reference-free evaluators are more suitable for open-ended examples with different semantics responses. But not all examples are open-ended. For closed-ended examples with unique correct semantic response, reference-free evaluators will still consider it high quality when giving a response that is inconsistent with the facts and the semantic of reference. In order to comprehensively evaluate the reliability of evaluators based on LLMs, we construct two adversarial meta-evaluation dialogue generation datasets KdConv-ADV and DSTC7-ADV based on KdConv and DSTC7-AVSD, respectively. Compared to previous meta-evaluation benchmarks, KdConv-ADV and DSTC7-ADV are much more challenging since they requires evaluators to be able to reasonably evaluate closed-ended examples with the help of external knowledge or even its own knowledge. Empirical results show that the ability of LLMs to identify unreasonable responses is insufficient. There are risks in using eference-free evaluators based on LLMs to evaluate the quality of dialogue responses.

* preprint 
Viaarxiv icon

WSFE: Wasserstein Sub-graph Feature Encoder for Effective User Segmentation in Collaborative Filtering

May 08, 2023
Yankai Chen, Yifei Zhang, Menglin Yang, Zixing Song, Chen Ma, Irwin King

Figure 1 for WSFE: Wasserstein Sub-graph Feature Encoder for Effective User Segmentation in Collaborative Filtering
Figure 2 for WSFE: Wasserstein Sub-graph Feature Encoder for Effective User Segmentation in Collaborative Filtering
Figure 3 for WSFE: Wasserstein Sub-graph Feature Encoder for Effective User Segmentation in Collaborative Filtering

Maximizing the user-item engagement based on vectorized embeddings is a standard procedure of recent recommender models. Despite the superior performance for item recommendations, these methods however implicitly deprioritize the modeling of user-wise similarity in the embedding space; consequently, identifying similar users is underperforming, and additional processing schemes are usually required otherwise. To avoid thorough model re-training, we propose WSFE, a model-agnostic and training-free representation encoder, to be flexibly employed on the fly for effective user segmentation. Underpinned by the optimal transport theory, the encoded representations from WSFE present a matched user-wise similarity/distance measurement between the realistic and embedding space. We incorporate WSFE into six state-of-the-art recommender models and conduct extensive experiments on six real-world datasets. The empirical analyses well demonstrate the superiority and generality of WSFE to fuel multiple downstream tasks with diverse underlying targets in recommendation.

Viaarxiv icon

Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space

Apr 01, 2023
Yankai Chen, Yixiang Fang, Yifei Zhang, Irwin King

Figure 1 for Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space
Figure 2 for Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space
Figure 3 for Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space
Figure 4 for Bipartite Graph Convolutional Hashing for Effective and Efficient Top-N Search in Hamming Space

Searching on bipartite graphs is basal and versatile to many real-world Web applications, e.g., online recommendation, database retrieval, and query-document searching. Given a query node, the conventional approaches rely on the similarity matching with the vectorized node embeddings in the continuous Euclidean space. To efficiently manage intensive similarity computation, developing hashing techniques for graph structured data has recently become an emerging research direction. Despite the retrieval efficiency in Hamming space, prior work is however confronted with catastrophic performance decay. In this work, we investigate the problem of hashing with Graph Convolutional Network on bipartite graphs for effective Top-N search. We propose an end-to-end Bipartite Graph Convolutional Hashing approach, namely BGCH, which consists of three novel and effective modules: (1) adaptive graph convolutional hashing, (2) latent feature dispersion, and (3) Fourier serialized gradient estimation. Specifically, the former two modules achieve the substantial retention of the structural information against the inevitable information loss in hash encoding; the last module develops Fourier Series decomposition to the hashing function in the frequency domain mainly for more accurate gradient estimation. The extensive experiments on six real-world datasets not only show the performance superiority over the competing hashing-based counterparts, but also demonstrate the effectiveness of all proposed model components contained therein.

* Accepted by WWW 2023 
Viaarxiv icon

Cones: Concept Neurons in Diffusion Models for Customized Generation

Mar 09, 2023
Zhiheng Liu, Ruili Feng, Kai Zhu, Yifei Zhang, Kecheng Zheng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao

Figure 1 for Cones: Concept Neurons in Diffusion Models for Customized Generation
Figure 2 for Cones: Concept Neurons in Diffusion Models for Customized Generation
Figure 3 for Cones: Concept Neurons in Diffusion Models for Customized Generation
Figure 4 for Cones: Concept Neurons in Diffusion Models for Customized Generation

Human brains respond to semantic features of presented stimuli with different neurons. It is then curious whether modern deep neural networks admit a similar behavior pattern. Specifically, this paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject. We call those neurons the concept neurons. They can be identified by statistics of network gradients to a stimulation connected with the given subject. The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results. Shutting them can directly yield the related subject contextualized in different scenes. Concatenating multiple clusters of concept neurons can vividly generate all related concepts in a single image. A few steps of further fine-tuning can enhance the multi-concept capability, which may be the first to manage to generate up to four different subjects in a single image. For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values of the parameters, which reduces storage consumption by 90\% compared with previous subject-driven generation methods. Extensive qualitative and quantitative studies on diverse scenarios show the superiority of our method in interpreting and manipulating diffusion models.

Viaarxiv icon

A Survey of Trustworthy Federated Learning with Perspectives on Security, Robustness, and Privacy

Feb 21, 2023
Yifei Zhang, Dun Zeng, Jinglong Luo, Zenglin Xu, Irwin King

Figure 1 for A Survey of Trustworthy Federated Learning with Perspectives on Security, Robustness, and Privacy
Figure 2 for A Survey of Trustworthy Federated Learning with Perspectives on Security, Robustness, and Privacy
Figure 3 for A Survey of Trustworthy Federated Learning with Perspectives on Security, Robustness, and Privacy
Figure 4 for A Survey of Trustworthy Federated Learning with Perspectives on Security, Robustness, and Privacy

Trustworthy artificial intelligence (AI) technology has revolutionized daily life and greatly benefited human society. Among various AI technologies, Federated Learning (FL) stands out as a promising solution for diverse real-world scenarios, ranging from risk evaluation systems in finance to cutting-edge technologies like drug discovery in life sciences. However, challenges around data isolation and privacy threaten the trustworthiness of FL systems. Adversarial attacks against data privacy, learning algorithm stability, and system confidentiality are particularly concerning in the context of distributed training in federated learning. Therefore, it is crucial to develop FL in a trustworthy manner, with a focus on security, robustness, and privacy. In this survey, we propose a comprehensive roadmap for developing trustworthy FL systems and summarize existing efforts from three key aspects: security, robustness, and privacy. We outline the threats that pose vulnerabilities to trustworthy federated learning across different stages of development, including data processing, model training, and deployment. To guide the selection of the most appropriate defense methods, we discuss specific technical solutions for realizing each aspect of Trustworthy FL (TFL). Our approach differs from previous work that primarily discusses TFL from a legal perspective or presents FL from a high-level, non-technical viewpoint.

Viaarxiv icon

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

Jan 03, 2023
Jianhui Li, Zhennan Qin, Yijie Mei, Jingze Cui, Yunfei Song, Ciyong Chen, Yifei Zhang, Longsheng Du, Xianhang Cheng, Baihui Jin, Jason Ye, Eric Lin, Dan Lavery

Figure 1 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
Figure 2 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
Figure 3 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
Figure 4 for oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

With the rapid development of deep learning models and hardware support for dense computing, the deep learning (DL) workload characteristics changed significantly from a few hot spots on compute-intensive operations to a broad range of operations scattered across the models. Accelerating a few compute-intensive operations using the expert-tuned implementation of primitives does not fully exploit the performance potential of AI hardware. Various efforts are made to compile a full deep neural network (DNN) graph. One of the biggest challenges is to achieve end-to-end compilation by generating expert-level performance code for the dense compute-intensive operations and applying compilation optimization at the scope of DNN computation graph across multiple compute-intensive operations. We present oneDNN Graph Compiler, a tensor compiler that employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high-performance code generation of the deep neural network graph. oneDNN Graph Compiler addresses unique optimization challenges in the deep learning domain, such as low-precision computation, aggressive fusion, optimization for static tensor shapes and memory layout, constant weight optimization, and memory buffer reuse. Experimental results demonstrate up to 2x performance gains over primitives-based optimization for performance-critical DNN computation graph patterns on Intel Xeon Scalable Processors.

* 12 pages excluding reference, 8 figures, 1 table. concurrently submitted to OSDI 2023 
Viaarxiv icon