Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheng Zhang

Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University, China

Tele-FLM Technical Report

Apr 25, 2024

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang(+10 more)

Abstract:Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.

Via

Access Paper or Ask Questions

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

Apr 23, 2024

Linxuan Xin, Zheng Zhang, Jinfu Wei, Ge Li, Duan Gao

Abstract:Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by text and multi-modal controls, providing high controllability and diversity in material generation. Key to achieving diverse and high-quality PBR material generation lies in integrating the capabilities of recent large-scale vision-language models trained on billions of text-image pairs, along with material priors derived from hundreds of PBR material samples. We utilize a novel material Latent Diffusion Model (LDM) to establish the mapping between albedo maps and the corresponding latent space. The latent representation is then decoded into full SVBRDF parameter maps using a rendering-aware PBR decoder. Our method supports tileable generation through convolution with circular padding. Furthermore, we introduce a multi-modal guidance module, which includes pixel-aligned guidance, style image guidance, and 3D shape guidance, to enhance the control capabilities of the material LDM. We demonstrate the effectiveness of DreamPBR in material creation, showcasing its versatility and user-friendliness on a wide range of controllable generation and editing applications.

* 16 pages, 17 figures

Via

Access Paper or Ask Questions

MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

Apr 17, 2024

Xueyuan Gong, Yain-whar Si, Zheng Zhang, Xiaochen Yuan, Ke Wang, Xinyuan Zhang, Cong Lin, Xiaoxiang Liu

Figure 1 for MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

Figure 2 for MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

Figure 3 for MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

Figure 4 for MHLR: Moving Haar Learning Rate Scheduler for Large-scale Face Recognition Training with One GPU

Abstract:Face recognition (FR) has seen significant advancements due to the utilization of large-scale datasets. Training deep FR models on large-scale datasets with multiple GPUs is now a common practice. In fact, computing power has evolved into a foundational and indispensable resource in the area of deep learning. It is nearly impossible to train a deep FR model without holding adequate hardware resources. Recognizing this challenge, some FR approaches have started exploring ways to reduce the time complexity of the fully-connected layer in FR models. Unlike other approaches, this paper introduces a simple yet highly effective approach, Moving Haar Learning Rate (MHLR) scheduler, for scheduling the learning rate promptly and accurately in the training process. MHLR supports large-scale FR training with only one GPU, which is able to accelerate the model to 1/4 of its original training time without sacrificing more than 1% accuracy. More specifically, MHLR only needs $30$ hours to train the model ResNet100 on the dataset WebFace12M containing more than 12M face images with 0.6M identities. Extensive experiments validate the efficiency and effectiveness of MHLR.

Via

Access Paper or Ask Questions

Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective

Apr 06, 2024

Duanyu Feng, Bowen Qin, Chen Huang, Zheng Zhang, Wenqiang Lei

Figure 1 for Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective

Abstract:Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across various tasks, DPO has been criticized for its sensitivity to the SFT's effectiveness and its hindrance to the learning capacity towards human-preferred responses, leading to less satisfactory performance. To overcome those limitations, the theoretical understanding of DPO are indispensable but still lacking. To this end, we take a step towards theoretically analyzing and understanding the limitations of DPO. Specifically, we provide an analytical framework using the field theory to analyze the optimization process of DPO. By analyzing the gradient vector field of the DPO loss function, we find that the DPO loss function decreases the probability of producing human dispreferred data at a faster rate than it increases the probability of producing preferred data. This provides theoretical insights for understanding the limitations of DPO discovered in the related research experiments, thereby setting the foundation for its improvement.

* Draft version

Via

Access Paper or Ask Questions

Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs

Apr 01, 2024

Zheng Zhang, Fan Yang, Ziyan Jiang, Zheng Chen, Zhengyang Zhao, Chengyuan Ma, Liang Zhao, Yang Liu

Abstract:Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. This development is particularly crucial for tasks that involve retrieving knowledge from an external datastore, which can result in long inputs. However, recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information within the input sequence. In this study, we conduct extensive experiments to investigate the root causes of positional bias. Our findings indicate that the primary contributor to LLM positional bias stems from the inherent positional preferences of different models. We demonstrate that merely employing prompt-based solutions is inadequate for overcoming the positional preferences. To address this positional bias issue of a pre-trained LLM, we developed a Position-Aware Parameter Efficient Fine-Tuning (PAPEFT) approach which is composed of a data augmentation technique and a parameter efficient adapter, enhancing a uniform attention distribution across the input context. Our experiments demonstrate that the proposed approach effectively reduces positional bias, improving LLMs' effectiveness in handling long context sequences for various tasks that require externally retrieved knowledge.

Via

Access Paper or Ask Questions

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Mar 31, 2024

Zhuotong Chen, Zihu Wang, Yifan Yang, Qianxiao Li, Zheng Zhang

Figure 1 for PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Figure 2 for PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Figure 3 for PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Figure 4 for PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Abstract:Despite the effectiveness of deep neural networks in numerous natural language processing applications, recent findings have exposed the vulnerability of these language models when minor perturbations are introduced. While appearing semantically indistinguishable to humans, these perturbations can significantly reduce the performance of well-trained language models, raising concerns about the reliability of deploying them in safe-critical situations. In this work, we construct a computationally efficient self-healing process to correct undesired model behavior during online inference when perturbations are applied to input data. This is formulated as a trajectory optimization problem in which the internal states of the neural network layers are automatically corrected using a PID (Proportional-Integral-Derivative) control mechanism. The P controller targets immediate state adjustments, while the I and D controllers consider past states and future dynamical trends, respectively. We leverage the geometrical properties of the training data to design effective linear PID controllers. This approach reduces the computational cost to that of using just the P controller, instead of the full PID control. Further, we introduce an analytical method for approximating the optimal control solutions, enhancing the real-time inference capabilities of this controlled system. Moreover, we conduct a theoretical error analysis of the analytic solution in a simplified setting. The proposed PID control-based self-healing is a low cost framework that improves the robustness of pre-trained large language models, whether standard or robustly trained, against a wide range of perturbations. A detailed implementation can be found in:https://github.com/zhuotongchen/PID-Control-Based-Self-Healing-to-Improve-the-Robustness-of-Large-Language-Models.

* Transactions on Machine Learning Research

Via

Access Paper or Ask Questions

EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs

Mar 30, 2024

Cheng Jiayang, Lin Qiu, Chunkit Chan, Xin Liu, Yangqiu Song, Zheng Zhang

Abstract:Narrative reasoning relies on the understanding of eventualities in story contexts, which requires a wealth of background world knowledge. To help machines leverage such knowledge, existing solutions can be categorized into two groups. Some focus on implicitly modeling eventuality knowledge by pretraining language models (LMs) with eventuality-aware objectives. However, this approach breaks down knowledge structures and lacks interpretability. Others explicitly collect world knowledge of eventualities into structured eventuality-centric knowledge graphs (KGs). However, existing research on leveraging these knowledge sources for free-texts is limited. In this work, we propose an initial comprehensive framework called EventGround, which aims to tackle the problem of grounding free-texts to eventuality-centric KGs for contextualized narrative reasoning. We identify two critical problems in this direction: the event representation and sparsity problems. We provide simple yet effective parsing and partial information extraction methods to tackle these problems. Experimental results demonstrate that our approach consistently outperforms baseline models when combined with graph neural network (GNN) or large language model (LLM) based graph reasoning models. Our framework, incorporating grounded knowledge, achieves state-of-the-art performance while providing interpretable evidence.

Via

Access Paper or Ask Questions

CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

Mar 28, 2024

Jie Wen, Zheng Zhang, Yong Xu, Bob Zhang, Lunke Fei, Guo-Sen Xie

Figure 1 for CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

Figure 2 for CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

Figure 3 for CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

Figure 4 for CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

Abstract:In recent years, incomplete multi-view clustering, which studies the challenging multi-view clustering problem on missing views, has received growing research interests. Although a series of methods have been proposed to address this issue, the following problems still exist: 1) Almost all of the existing methods are based on shallow models, which is difficult to obtain discriminative common representations. 2) These methods are generally sensitive to noise or outliers since the negative samples are treated equally as the important samples. In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues. Specifically, it captures the high-level features and local structure of each view by incorporating the view-specific deep encoders and graph embedding strategy into a framework. Moreover, based on the human cognition, i.e., learning from easy to hard, it introduces a self-paced strategy to select the most confident samples for model training, which can reduce the negative influence of outliers. Experimental results on several incomplete datasets show that CDIMC-net outperforms the state-of-the-art incomplete multi-view clustering methods.

* Accepted by IJCAI 2020

Via

Access Paper or Ask Questions

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Mar 22, 2024

Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao

Figure 1 for Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Figure 2 for Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Figure 3 for Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Figure 4 for Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Abstract:3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To this end, we propose a novel method, named Pixel-GS, to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive experiments both qualitatively and quantitatively demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time rendering speed, on the challenging Mip-NeRF 360 and Tanks & Temples datasets.

Via

Access Paper or Ask Questions

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Mar 21, 2024

Zheng Zhang, Yeyao Ma, Enming Zhang, Xiang Bai

Figure 1 for PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Figure 2 for PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Figure 3 for PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Figure 4 for PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

Abstract:PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges. To overcome the limitation of the LMM being limited to textual output, PSALM incorporates a mask decoder and a well-designed input schema to handle a variety of segmentation tasks. This schema includes images, task instructions, conditional prompts, and mask tokens, which enable the model to generate and classify segmentation masks effectively. The flexible design of PSALM supports joint training across multiple datasets and tasks, leading to improved performance and task generalization. PSALM achieves superior results on several benchmarks, such as RefCOCO/RefCOCO+/RefCOCOg, COCO Panoptic Segmentation, and COCO-Interactive, and further exhibits zero-shot capabilities on unseen tasks, such as open-vocabulary segmentation, generalized referring expression segmentation and video object segmentation, making a significant step towards a GPT moment in computer vision. Through extensive experiments, PSALM demonstrates its potential to transform the domain of image segmentation, leveraging the robust visual understanding capabilities of LMMs as seen in natural language processing. Code and models are available at https://github.com/zamling/PSALM.

Via

Access Paper or Ask Questions