Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Zhu

Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN)

Jun 24, 2023
Xu Si, Xinming Wu, Zefeng Li, Shenghou Wang, Jun Zhu

Figure 1 for Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN)

Figure 2 for Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN)

Figure 3 for Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN)

Figure 4 for Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN)

Earthquake monitoring is vital for understanding the physics of earthquakes and assessing seismic hazards. A standard monitoring workflow includes the interrelated and interdependent tasks of phase picking, association, and location. Although deep learning methods have been successfully applied to earthquake monitoring, they mostly address the tasks separately and ignore the geographic relationships among stations. Here, we propose a graph neural network that operates directly on multi-station seismic data and achieves simultaneous phase picking, association, and location. Particularly, the inter-station and inter-task physical relationships are informed in the network architecture to promote accuracy, interpretability, and physical consistency among cross-station and cross-task predictions. When applied to data from the Ridgecrest region and Japan regions, this method showed superior performance over previous deep learning-based phase-picking and localization methods. Overall, our study provides for the first time a prototype self-consistent all-in-one system of simultaneous seismic phase picking, association, and location, which has the potential for next-generation autonomous earthquake monitoring.

* 30 pages, 12 figures, 3 tables

Via

Access Paper or Ask Questions

Training Transformers with 4-bit Integers

Jun 22, 2023
Haocheng Xi, Changhao Li, Jianfei Chen, Jun Zhu

Figure 1 for Training Transformers with 4-bit Integers

Figure 2 for Training Transformers with 4-bit Integers

Figure 3 for Training Transformers with 4-bit Integers

Figure 4 for Training Transformers with 4-bit Integers

Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. Our algorithm achieves competitive accuracy on a wide range of tasks including natural language understanding, machine translation, and image classification. Unlike previous 4-bit training methods, our algorithm can be implemented on the current generation of GPUs. Our prototypical linear operator implementation is up to 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Stabilizing GANs' Training with Brownian Motion Controller

Jun 18, 2023
Tianjiao Luo, Ziyu Zhu, Jianfei Chen, Jun Zhu

Figure 1 for Stabilizing GANs' Training with Brownian Motion Controller

Figure 2 for Stabilizing GANs' Training with Brownian Motion Controller

Figure 3 for Stabilizing GANs' Training with Brownian Motion Controller

Figure 4 for Stabilizing GANs' Training with Brownian Motion Controller

The training process of generative adversarial networks (GANs) is unstable and does not converge globally. In this paper, we examine the stability of GANs from the perspective of control theory and propose a universal higher-order noise-based controller called Brownian Motion Controller (BMC). Starting with the prototypical case of Dirac-GANs, we design a BMC to retrieve precisely the same but reachable optimal equilibrium. We theoretically prove that the training process of DiracGANs-BMC is globally exponential stable and derive bounds on the rate of convergence. Then we extend our BMC to normal GANs and provide implementation instructions on GANs-BMC. Our experiments show that our GANs-BMC effectively stabilizes GANs' training under StyleGANv2-ada frameworks with a faster rate of convergence, a smaller range of oscillation, and better performance in terms of FID score.

Via

Access Paper or Ask Questions

PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs

Jun 15, 2023
Zhongkai Hao, Jiachen Yao, Chang Su, Hang Su, Ziao Wang, Fanzhi Lu, Zeyu Xia, Yichi Zhang, Songming Liu, Lu Lu, Jun Zhu

Figure 1 for PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs

Figure 2 for PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs

Figure 3 for PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs

Figure 4 for PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs

While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. While PINNacle does not guarantee success in all real-world scenarios, it represents a significant contribution to the field by offering a robust, diverse, and comprehensive benchmark suite that will undoubtedly foster further research and development in PINNs.

Via

Access Paper or Ask Questions

MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks

Jun 05, 2023
Jiachen Yao, Chang Su, Zhongkai Hao, Songming Liu, Hang Su, Jun Zhu

Figure 1 for MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks

Figure 2 for MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks

Figure 3 for MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks

Figure 4 for MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks

Physics-informed Neural Networks (PINNs) have recently achieved remarkable progress in solving Partial Differential Equations (PDEs) in various fields by minimizing a weighted sum of PDE loss and boundary loss. However, there are several critical challenges in the training of PINNs, including the lack of theoretical frameworks and the imbalance between PDE loss and boundary loss. In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems. We also characterize the connections between the training loss and actual error, guaranteeing convergence under mild conditions. The theoretical analysis inspires us to further propose MultiAdam, a scale-invariant optimizer that leverages gradient momentum to parameter-wisely balance the loss terms. Extensive experiment results on multiple problems from different physical domains demonstrate that our MultiAdam solver can improve the predictive accuracy by 1-2 orders of magnitude compared with strong baselines.

Via

Access Paper or Ask Questions

NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data

May 31, 2023
Songming Liu, Zhongkai Hao, Chengyang Ying, Hang Su, Ze Cheng, Jun Zhu

Figure 1 for NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data

Figure 2 for NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data

Figure 3 for NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data

Figure 4 for NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data

The neural operator has emerged as a powerful tool in learning mappings between function spaces in PDEs. However, when faced with real-world physical data, which are often highly non-uniformly distributed, it is challenging to use mesh-based techniques such as the FFT. To address this, we introduce the Non-Uniform Neural Operator (NUNO), a comprehensive framework designed for efficient operator learning with non-uniform data. Leveraging a K-D tree-based domain decomposition, we transform non-uniform data into uniform grids while effectively controlling interpolation error, thereby paralleling the speed and accuracy of learning from non-uniform data. We conduct extensive experiments on 2D elasticity, (2+1)D channel flow, and a 3D multi-physics heatsink, which, to our knowledge, marks a novel exploration into 3D PDE problems with complex geometries. Our framework has reduced error rates by up to 60% and enhanced training speeds by 2x to 30x. The code is now available at https://github.com/thu-ml/NUNO.

Via

Access Paper or Ask Questions

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

May 30, 2023
Guande He, Jianfei Chen, Jun Zhu

Figure 1 for Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Figure 2 for Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Figure 3 for Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Figure 4 for Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Large pre-trained language models (PLMs) have demonstrated strong performance on natural language understanding (NLU) tasks through fine-tuning. However, fine-tuned models still suffer from overconfident predictions, especially in out-of-domain settings. In this paper, we tackle the problem of calibrating fine-tuned language models. We demonstrate that the PLMs are well-calibrated on the masked language modeling task with robust predictive confidence under domain shift, yet the fine-tuned models fail to retain such property due to catastrophic forgetting, which impacts the calibration on the downstream classification task. In light of these observations, we evaluate the calibration of several methods that preserve pre-trained features and show that preserving pre-trained features can improve the calibration of fine-tuned language models. Among these methods, our proposed method that encourages the fine-tuned model to learn generative representations with auxiliary language modeling objective achieves competitive accuracy and the lowest expected calibration error compared to several strong baselines under both in-domain and out-of-domain settings on three downstream NLU tasks.

* ICLR 2023

Via

Access Paper or Ask Questions

Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness

May 28, 2023
Zhanhao Hu, Jun Zhu, Bo Zhang, Xiaolin Hu

Figure 1 for Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness

Figure 2 for Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness

Figure 3 for Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness

Figure 4 for Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness

Recent works found that deep neural networks (DNNs) can be fooled by adversarial examples, which are crafted by adding adversarial noise on clean inputs. The accuracy of DNNs on adversarial examples will decrease as the magnitude of the adversarial noise increase. In this study, we show that DNNs can be also fooled when the noise is very small under certain circumstances. This new type of attack is called Amplification Trojan Attack (ATAttack). Specifically, we use a trojan network to transform the inputs before sending them to the target DNN. This trojan network serves as an amplifier to amplify the inherent weakness of the target DNN. The target DNN, which is infected by the trojan network, performs normally on clean data while being more vulnerable to adversarial examples. Since it only transforms the inputs, the trojan network can hide in DNN-based pipelines, e.g. by infecting the pre-processing procedure of the inputs before sending them to the DNNs. This new type of threat should be considered in developing safe DNNs.

* Published Sep 2022 in Neurocomputing

Via

Access Paper or Ask Questions

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

May 26, 2023
Min Zhao, Rongzhen Wang, Fan Bao, Chongxuan Li, Jun Zhu

Figure 1 for ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

Figure 2 for ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

Figure 3 for ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

Figure 4 for ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

In this paper, we present ControlVideo, a novel method for text-driven video editing. Leveraging the capabilities of text-to-image diffusion models and ControlNet, ControlVideo aims to enhance the fidelity and temporal consistency of videos that align with a given text while preserving the structure of the source video. This is achieved by incorporating additional conditions such as edge maps, fine-tuning the key-frame and temporal attention on the source video-text pair with carefully designed strategies. An in-depth exploration of ControlVideo's design is conducted to inform future research on one-shot tuning video diffusion models. Quantitatively, ControlVideo outperforms a range of competitive baselines in terms of faithfulness and consistency while still aligning with the textual prompt. Additionally, it delivers videos with high visual realism and fidelity w.r.t. the source content, demonstrating flexibility in utilizing controls containing varying degrees of source video information, and the potential for multiple control combinations. The project page is available at \href{https://ml.cs.tsinghua.edu.cn/controlvideo/}{https://ml.cs.tsinghua.edu.cn/controlvideo/}.

Via

Access Paper or Ask Questions