Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiran Chen

Electrical and Computer Engineering Department, Duke University, Durham, NC, USA

Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

Apr 10, 2024

Hongru Du, Jianan Zhao, Yang Zhao, Shaochong Xu, Xihong Lin, Yiran Chen, Lauren M. Gardner, Hao Frank Yang

Figure 1 for Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

Figure 2 for Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

Figure 3 for Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

Figure 4 for Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

Abstract:Forecasting the short-term spread of an ongoing disease outbreak is a formidable challenge due to the complexity of contributing factors, some of which can be characterized through interlinked, multi-modality variables such as epidemiological time series data, viral biology, population demographics, and the intersection of public policy and human behavior. Existing forecasting model frameworks struggle with the multifaceted nature of relevant data and robust results translation, which hinders their performances and the provision of actionable insights for public health decision-makers. Our work introduces PandemicLLM, a novel framework with multi-modal Large Language Models (LLMs) that reformulates real-time forecasting of disease spread as a text reasoning problem, with the ability to incorporate real-time, complex, non-numerical information that previously unattainable in traditional forecasting models. This approach, through a unique AI-human cooperative prompt design and time series representation learning, encodes multi-modal data for LLMs. The model is applied to the COVID-19 pandemic, and trained to utilize textual public health policies, genomic surveillance, spatial, and epidemiological time series data, and is subsequently tested across all 50 states of the U.S. Empirically, PandemicLLM is shown to be a high-performing pandemic forecasting framework that effectively captures the impact of emerging variants and can provide timely and accurate predictions. The proposed PandemicLLM opens avenues for incorporating various pandemic-related data in heterogeneous formats and exhibits performance benefits over existing models. This study illuminates the potential of adapting LLMs and representation learning to enhance pandemic forecasting, illustrating how AI innovations can strengthen pandemic responses and crisis management in the future.

* 35 pages, 10 figures

Via

Access Paper or Ask Questions

Peeking Behind the Curtains of Residual Learning

Feb 13, 2024

Tunhou Zhang, Feng Yan, Hai Li, Yiran Chen

Abstract:The utilization of residual learning has become widespread in deep and scalable neural nets. However, the fundamental principles that contribute to the success of residual learning remain elusive, thus hindering effective training of plain nets with depth scalability. In this paper, we peek behind the curtains of residual learning by uncovering the "dissipating inputs" phenomenon that leads to convergence failure in plain neural nets: the input is gradually compromised through plain layers due to non-linearities, resulting in challenges of learning feature representations. We theoretically demonstrate how plain neural nets degenerate the input to random noise and emphasize the significance of a residual connection that maintains a better lower bound of surviving neurons as a solution. With our theoretical discoveries, we propose "The Plain Neural Net Hypothesis" (PNNH) that identifies the internal path across non-linear layers as the most critical part in residual learning, and establishes a paradigm to support the training of deep plain neural nets devoid of residual connections. We thoroughly evaluate PNNH-enabled CNN architectures and Transformers on popular vision benchmarks, showing on-par accuracy, up to 0.3% higher training throughput, and 2x better parameter efficiency compared to ResNets and vision Transformers.

* Arxiv Preprint

Via

Access Paper or Ask Questions

Group Distributionally Robust Dataset Distillation with Risk Minimization

Feb 07, 2024

Saeed Vahidian, Mingyu Wang, Jianyang Gu, Vyacheslav Kungurtsev, Wei Jiang, Yiran Chen

Figure 1 for Group Distributionally Robust Dataset Distillation with Risk Minimization

Figure 2 for Group Distributionally Robust Dataset Distillation with Risk Minimization

Figure 3 for Group Distributionally Robust Dataset Distillation with Risk Minimization

Figure 4 for Group Distributionally Robust Dataset Distillation with Risk Minimization

Abstract:Dataset distillation (DD) has emerged as a widely adopted technique for crafting a synthetic dataset that captures the essential information of a training dataset, facilitating the training of accurate neural models. Its applications span various domains, including transfer learning, federated learning, and neural architecture search. The most popular methods for constructing the synthetic data rely on matching the convergence properties of training the model with the synthetic dataset and the training dataset. However, targeting the training dataset must be thought of as auxiliary in the same sense that the training set is an approximate substitute for the population distribution, and the latter is the data of interest. Yet despite its popularity, an aspect that remains unexplored is the relationship of DD to its generalization, particularly across uncommon subgroups. That is, how can we ensure that a model trained on the synthetic dataset performs well when faced with samples from regions with low population density? Here, the representativeness and coverage of the dataset become salient over the guaranteed training error at inference. Drawing inspiration from distributionally robust optimization, we introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD. We provide a theoretical rationale for our approach and demonstrate its effective generalization and robustness across subgroups through numerical experiments.

Via

Access Paper or Ask Questions

PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions

Dec 14, 2023

Qijun Zhang, Shiyu Li, Guanglei Zhou, Jingyu Pan, Chen-Chia Chang, Yiran Chen, Zhiyao Xie

Figure 1 for PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions

Figure 2 for PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions

Figure 3 for PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions

Figure 4 for PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions

Abstract:Power efficiency is a critical design objective in modern microprocessor design. To evaluate the impact of architectural-level design decisions, an accurate yet efficient architecture-level power model is desired. However, widely adopted data-independent analytical power models like McPAT and Wattch have been criticized for their unreliable accuracy. While some machine learning (ML) methods have been proposed for architecture-level power modeling, they rely on sufficient known designs for training and perform poorly when the number of available designs is limited, which is typically the case in realistic scenarios. In this work, we derive a general formulation that unifies existing architecture-level power models. Based on the formulation, we propose PANDA, an innovative architecture-level solution that combines the advantages of analytical and ML power models. It achieves unprecedented high accuracy on unknown new designs even when there are very limited designs for training, which is a common challenge in practice. Besides being an excellent power model, it can predict area, performance, and energy accurately. PANDA further supports power prediction for unknown new technology nodes. In our experiments, besides validating the superior performance and the wide range of functionalities of PANDA, we also propose an application scenario, where PANDA proves to identify high-performance design configurations given a power constraint.

* IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2023

Via

Access Paper or Ask Questions

EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

Dec 04, 2023

Jingyu Pan, Chen-Chia Chang, Zhiyao Xie, Yiran Chen

Figure 1 for EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

Figure 2 for EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

Figure 3 for EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

Figure 4 for EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research

Abstract:The application of Machine Learning (ML) in Electronic Design Automation (EDA) for Very Large-Scale Integration (VLSI) design has garnered significant research attention. Despite the requirement for extensive datasets to build effective ML models, most studies are limited to smaller, internally generated datasets due to the lack of comprehensive public resources. In response, we introduce EDALearn, the first holistic, open-source benchmark suite specifically for ML tasks in EDA. This benchmark suite presents an end-to-end flow from synthesis to physical implementation, enriching data collection across various stages. It fosters reproducibility and promotes research into ML transferability across different technology nodes. Accommodating a wide range of VLSI design instances and sizes, our benchmark aptly represents the complexity of contemporary VLSI designs. Additionally, we provide an in-depth data analysis, enabling users to fully comprehend the attributes and distribution of our data, which is essential for creating efficient ML models. Our contributions aim to encourage further advances in the ML-EDA domain.

* 8 pages

Via

Access Paper or Ask Questions

Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

Dec 03, 2023

Yuqi Jia, Saeed Vahidian, Jingwei Sun, Jianyi Zhang, Vyacheslav Kungurtsev, Neil Zhenqiang Gong, Yiran Chen

Figure 1 for Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

Figure 2 for Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

Figure 3 for Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

Figure 4 for Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

Abstract:Data heterogeneity presents significant challenges for federated learning (FL). Recently, dataset distillation techniques have been introduced, and performed at the client level, to attempt to mitigate some of these challenges. In this paper, we propose a highly efficient FL dataset distillation framework on the server side, significantly reducing both the computational and communication demands on local devices while enhancing the clients' privacy. Unlike previous strategies that perform dataset distillation on local devices and upload synthetic data to the server, our technique enables the server to leverage prior knowledge from pre-trained deep generative models to synthesize essential data representations from a heterogeneous model architecture. This process allows local devices to train smaller surrogate models while enabling the training of a larger global model on the server, effectively minimizing resource utilization. We substantiate our claim with a theoretical analysis, demonstrating the asymptotic resemblance of the process to the hypothetical ideal of completely centralized training on a heterogeneous dataset. Empirical evidence from our comprehensive experiments indicates our method's superiority, delivering an accuracy enhancement of up to 40% over non-dataset-distillation techniques in highly heterogeneous FL contexts, and surpassing existing dataset-distillation methods by 18%. In addition to the high accuracy, our framework converges faster than the baselines because rather than the server trains on several sets of heterogeneous data distributions, it trains on a multi-modal distribution. Our code is available at https://github.com/FedDG23/FedDG-main.git

Via

Access Paper or Ask Questions

Efficient Dataset Distillation via Minimax Diffusion

Nov 27, 2023

Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, Yiran Chen

Abstract:Dataset distillation reduces the storage and computational consumption of training a network by generating a small surrogate dataset that encapsulates rich information of the original large-scale one. However, previous distillation methods heavily rely on the sample-wise iterative optimization scheme. As the images-per-class (IPC) setting or image resolution grows larger, the necessary computation will demand overwhelming time and resources. In this work, we intend to incorporate generative diffusion techniques for computing the surrogate dataset. Observing that key factors for constructing an effective surrogate dataset are representativeness and diversity, we design additional minimax criteria in the generative training to enhance these facets for the generated images of diffusion models. We present a theoretical model of the process as hierarchical diffusion control demonstrating the flexibility of the diffusion process to target these criteria without jeopardizing the faithfulness of the sample to the desired distribution. The proposed method achieves state-of-the-art validation performance while demanding much less computational resources. Under the 100-IPC setting on ImageWoof, our method requires less than one-twentieth the distillation time of previous methods, yet yields even better performance. Source code available in https://github.com/vimar-gu/MinimaxDiffusion.

Via

Access Paper or Ask Questions

SD-NAE: Generating Natural Adversarial Examples with Stable Diffusion

Nov 21, 2023

Yueqian Lin, Jingyang Zhang, Yiran Chen, Hai Li

Figure 1 for SD-NAE: Generating Natural Adversarial Examples with Stable Diffusion

Figure 2 for SD-NAE: Generating Natural Adversarial Examples with Stable Diffusion

Figure 3 for SD-NAE: Generating Natural Adversarial Examples with Stable Diffusion

Abstract:Robustly evaluating deep learning image classifiers is challenging due to some limitations of standard datasets. Natural Adversarial Examples (NAEs), arising naturally from the environment and capable of deceiving classifiers, are instrumental in identifying vulnerabilities in trained models. Existing works collect such NAEs by filtering from a huge set of real images, a process that is passive and lacks control. In this work, we propose to actively synthesize NAEs with the state-of-the-art Stable Diffusion. Specifically, our method formulates a controlled optimization process, where we perturb the token embedding that corresponds to a specified class to synthesize NAEs. The generation is guided by the gradient of loss from the target classifier so that the created image closely mimics the ground-truth class yet fools the classifier. Named SD-NAE (Stable Diffusion for Natural Adversarial Examples), our innovative method is effective in producing valid and useful NAEs, which is demonstrated through a meticulously designed experiment. Our work thereby provides a valuable method for obtaining challenging evaluation data, which in turn can potentially advance the development of more robust deep learning models. Code is available at https://github.com/linyueqian/SD-NAE.

Via

Access Paper or Ask Questions

DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

Nov 08, 2023

Martin Kuo, Jianyi Zhang, Yiran Chen

Figure 1 for DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

Figure 2 for DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

Figure 3 for DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

Figure 4 for DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

Abstract:Building on the cost-efficient pretraining advancements brought about by Crammed BERT, we enhance its performance and interpretability further by introducing a novel pretrained model Dependency Agreement Crammed BERT (DACBERT) and its two-stage pretraining framework - Dependency Agreement Pretraining. This framework, grounded by linguistic theories, seamlessly weaves syntax and semantic information into the pretraining process. The first stage employs four dedicated submodels to capture representative dependency agreements at the chunk level, effectively converting these agreements into embeddings. The second stage uses these refined embeddings, in tandem with conventional BERT embeddings, to guide the pretraining of the rest of the model. Evaluated on the GLUE benchmark, our DACBERT demonstrates notable improvement across various tasks, surpassing Crammed BERT by 3.13% in the RTE task and by 2.26% in the MRPC task. Furthermore, our method boosts the average GLUE score by 0.83%, underscoring its significant potential. The pretraining process can be efficiently executed on a single GPU within a 24-hour cycle, necessitating no supplementary computational resources or extending the pretraining duration compared with the Crammed BERT. Extensive studies further illuminate our approach's instrumental role in bolstering the interpretability of pretrained language models for natural language understanding tasks.

Via

Access Paper or Ask Questions

DistDNAS: Search Efficient Feature Interactions within 2 Hours

Nov 01, 2023

Tunhou Zhang, Wei Wen, Igor Fedorov, Xi Liu, Buyun Zhang, Fangqiu Han, Wen-Yen Chen, Yiping Han, Feng Yan, Hai Li(+1 more)

Figure 1 for DistDNAS: Search Efficient Feature Interactions within 2 Hours

Figure 2 for DistDNAS: Search Efficient Feature Interactions within 2 Hours

Figure 3 for DistDNAS: Search Efficient Feature Interactions within 2 Hours

Figure 4 for DistDNAS: Search Efficient Feature Interactions within 2 Hours

Abstract:Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathematical operations introduces potential conflicts and additional redundancy toward recommender models, leading to sub-optimal trade-offs in performance and serving cost. In this paper, we present DistDNAS as a neat solution to brew swift and efficient feature interaction design. DistDNAS proposes a supernet to incorporate interaction modules of varying orders and types as a search space. To optimize search efficiency, DistDNAS distributes the search and aggregates the choice of optimal interaction modules on varying data dates, achieving over 25x speed-up and reducing search cost from 2 days to 2 hours. To optimize serving efficiency, DistDNAS introduces a differentiable cost-aware loss to penalize the selection of redundant interaction modules, enhancing the efficiency of discovered feature interactions in serving. We extensively evaluate the best models crafted by DistDNAS on a 1TB Criteo Terabyte dataset. Experimental evaluations demonstrate 0.001 AUC improvement and 60% FLOPs saving over current state-of-the-art CTR models.

Via

Access Paper or Ask Questions