Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Zhang

University of Science and Technology of China

Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data Prediction

Oct 16, 2024

Yang Zhang, Fa Wang, Xin Huang, Xintao Li, Sibei Liu, Hansong Zhang

Figure 1 for Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data Prediction

Figure 2 for Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data Prediction

Figure 3 for Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data Prediction

Figure 4 for Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data Prediction

Abstract:This study develops a cloud-based deep learning system for early prediction of diabetes, leveraging the distributed computing capabilities of the AWS cloud platform and deep learning technologies to achieve efficient and accurate risk assessment. The system utilizes EC2 p3.8xlarge GPU instances to accelerate model training, reducing training time by 93.2% while maintaining a prediction accuracy of 94.2%. With an automated data processing and model training pipeline built using Apache Airflow, the system can complete end-to-end updates within 18.7 hours. In clinical applications, the system demonstrates a prediction accuracy of 89.8%, sensitivity of 92.3%, and specificity of 95.1%. Early interventions based on predictions lead to a 37.5% reduction in diabetes incidence among the target population. The system's high performance and scalability provide strong support for large-scale diabetes prevention and management, showcasing significant public health value.

* 6 Pages, 5 Figures, 3 Tables. The final version will be published in the proceedings of the IEEE conference

Via

Access Paper or Ask Questions

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Oct 15, 2024

Yilun Hao, Yang Zhang, Chuchu Fan

Figure 1 for Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Figure 2 for Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Figure 3 for Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Figure 4 for Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Abstract:While large language models (LLMs) have recently demonstrated strong potential in solving planning problems, there is a trade-off between flexibility and complexity. LLMs, as zero-shot planners themselves, are still not capable of directly generating valid plans for complex planning problems such as multi-constraint or long-horizon tasks. On the other hand, many frameworks aiming to solve complex planning problems often rely on task-specific preparatory efforts, such as task-specific in-context examples and pre-defined critics/verifiers, which limits their cross-task generalization capability. In this paper, we tackle these challenges by observing that the core of many planning problems lies in optimization problems: searching for the optimal solution (best plan) with goals subject to constraints (preconditions and effects of decisions). With LLMs' commonsense, reasoning, and programming capabilities, this opens up the possibilities of a universal LLM-based approach to planning problems. Inspired by this observation, we propose LLMFP, a general-purpose framework that leverages LLMs to capture key information from planning problems and formally formulate and solve them as optimization problems from scratch, with no task-specific examples needed. We apply LLMFP to 9 planning problems, ranging from multi-constraint decision making to multi-step planning problems, and demonstrate that LLMFP achieves on average 83.7% and 86.8% optimal rate across 9 tasks for GPT-4o and Claude 3.5 Sonnet, significantly outperforming the best baseline (direct planning with OpenAI o1-preview) with 37.6% and 40.7% improvements. We also validate components of LLMFP with ablation experiments and analyzed the underlying success and failure reasons.

* 50 pages, 25 figures, 7 tables

Via

Access Paper or Ask Questions

A Hitchhiker's Guide to Scaling Law Estimation

Oct 15, 2024

Leshem Choshen, Yang Zhang, Jacob Andreas

Abstract:Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets. This provides an efficient way for practitioners and researchers alike to compare pretraining decisions involving optimizers, datasets, and model architectures. Despite the widespread use of scaling laws to model the dynamics of language model training, there has been little work on understanding how to best estimate and interpret them. We collect (and release) a large-scale dataset containing losses and downstream evaluations for 485 previously published pretrained models. We use these to estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families. We find that fitting scaling laws to intermediate checkpoints of training runs (and not just their final losses) substantially improves accuracy, and that -- all else equal -- estimates of performance are generally most accurate when derived from other models of similar sizes. However, because there is a significant degree of variability across model seeds, training multiple small models is sometimes more useful than training a single large one. Moreover, while different model families differ scaling behavior, they are often similar enough that a target model's behavior can be predicted from a single model with the same architecture, along with scaling parameter estimates derived from other model families.

Via

Access Paper or Ask Questions

Tracing Human Stress from Physiological Signals using UWB Radar

Oct 14, 2024

Jia Xu, Teng Xiao, Pin Lv, Zhe Chen, Chao Cai, Yang Zhang, Zehui Xiong

Abstract:Stress tracing is an important research domain that supports many applications, such as health care and stress management; and its closest related works are derived from stress detection. However, these existing works cannot well address two important challenges facing stress detection. First, most of these studies involve asking users to wear physiological sensors to detect their stress states, which has a negative impact on the user experience. Second, these studies have failed to effectively utilize multimodal physiological signals, which results in less satisfactory detection results. This paper formally defines the stress tracing problem, which emphasizes the continuous detection of human stress states. A novel deep stress tracing method, named DST, is presented. Note that DST proposes tracing human stress based on physiological signals collected by a noncontact ultrawideband radar, which is more friendly to users when collecting their physiological signals. In DST, a signal extraction module is carefully designed at first to robustly extract multimodal physiological signals from the raw RF data of the radar, even in the presence of body movement. Afterward, a multimodal fusion module is proposed in DST to ensure that the extracted multimodal physiological signals can be effectively fused and utilized. Extensive experiments are conducted on three real-world datasets, including one self-collected dataset and two publicity datasets. Experimental results show that the proposed DST method significantly outperforms all the baselines in terms of tracing human stress states. On average, DST averagely provides a 6.31% increase in detection accuracy on all datasets, compared with the best baselines.

* 19 pages, 11 figures

Via

Access Paper or Ask Questions

Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

Sep 30, 2024

Chenyou Fan, Chenjia Bai, Zhao Shan, Haoran He, Yang Zhang, Zhen Wang

Figure 1 for Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

Figure 2 for Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

Figure 3 for Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

Figure 4 for Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner

Abstract:Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via multi-task imitation, or require task-specific reward labels to facilitate policy optimization via Reinforcement Learning (RL). To address these challenges, we aim to develop a versatile diffusion planner that can leverage large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks. In this paper, we propose \textbf{SODP}, a two-stage framework that leverages \textbf{S}ub-\textbf{O}ptimal data to learn a \textbf{D}iffusion \textbf{P}lanner, which is generalizable for various downstream tasks. Specifically, in the pre-training stage, we train a foundation diffusion planner that extracts general planning capabilities by modeling the versatile distribution of multi-task trajectories, which can be sub-optimal and has wide data coverage. Then for downstream tasks, we adopt RL-based fine-tuning with task-specific rewards to fast refine the diffusion planner, which aims to generate action sequences with higher task-specific returns. Experimental results from multi-task domains including Meta-World and Adroit demonstrate that SODP outperforms state-of-the-art methods with only a small amount of data for reward-guided fine-tuning.

Via

Access Paper or Ask Questions

Offline Signature Verification Based on Feature Disentangling Aided Variational Autoencoder

Sep 29, 2024

Hansong Zhang, Jiangjian Guo, Kun Li, Yang Zhang, Yimei Zhao

Abstract:Offline handwritten signature verification systems are used to verify the identity of individuals, through recognizing their handwritten signature image as genuine signatures or forgeries. The main tasks of signature verification systems include extracting features from signature images and training a classifier for classification. The challenges of these tasks are twofold. First, genuine signatures and skilled forgeries are highly similar in their appearances, resulting in a small inter-class distance. Second, the instances of skilled forgeries are often unavailable, when signature verification models are being trained. To tackle these problems, this paper proposes a new signature verification method. It is the first model that employs a variational autoencoder (VAE) to extract features directly from signature images. To make the features more discriminative, it improves the traditional VAEs by introducing a new loss function for feature disentangling. In addition, it relies on SVM (Support Vector Machine) for classification according to the extracted features. Extensive experiments are conducted on two public datasets: MCYT-75 and GPDS-synthetic where the proposed method significantly outperformed $13$ representative offline signature verification methods. The achieved improvement in distinctive datasets indicates the robustness and great potential of the developed system in real application.

Via

Access Paper or Ask Questions

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

Sep 26, 2024

Tieyuan Chen, Huabin Liu, Tianyao He, Yihang Chen, Chaofan Gan, Xiao Ma, Cheng Zhong, Yang Zhang, Yingxue Wang, Hui Lin(+1 more)

Figure 1 for MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

Figure 2 for MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

Figure 3 for MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

Figure 4 for MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

Abstract:Video causal reasoning aims to achieve a high-level understanding of video content from a causal perspective. However, current video reasoning tasks are limited in scope, primarily executed in a question-answering paradigm and focusing on short videos containing only a single event and simple causal relationships, lacking comprehensive and structured causality analysis for videos with multiple events. To fill this gap, we introduce a new task and dataset, Multi-Event Causal Discovery (MECD). It aims to uncover the causal relationships between events distributed chronologically across long videos. Given visual segments and textual descriptions of events, MECD requires identifying the causal associations between these events to derive a comprehensive, structured event-level video causal diagram explaining why and how the final result event occurred. To address MECD, we devise a novel framework inspired by the Granger Causality method, using an efficient mask-based event prediction model to perform an Event Granger Test, which estimates causality by comparing the predicted result event when premise events are masked versus unmasked. Furthermore, we integrate causal inference techniques such as front-door adjustment and counterfactual inference to address challenges in MECD like causality confounding and illusory causality. Experiments validate the effectiveness of our framework in providing causal relationships in multi-event videos, outperforming GPT-4o and VideoLLaVA by 5.7% and 4.1%, respectively.

* Accepted at NeurIPS 2024 as a spotlight paper

Via

Access Paper or Ask Questions

Investigating Layer Importance in Large Language Models

Sep 22, 2024

Yang Zhang, Yanfei Dong, Kenji Kawaguchi

Abstract:Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts. Nevertheless, LLMs largely remain opaque. The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models. In this study, we advance the understanding of LLM by investigating the significance of individual layers in LLMs. We propose an efficient sampling method to faithfully evaluate the importance of layers using Shapley values, a widely used explanation framework in feature attribution and data valuation. In addition, we conduct layer ablation experiments to assess the performance degradation resulting from the exclusion of specific layers. Our findings reveal the existence of cornerstone layers, wherein certain early layers can exhibit a dominant contribution over others. Removing one cornerstone layer leads to a drastic collapse of the model performance, often reducing it to random guessing. Conversely, removing non-cornerstone layers results in only marginal performance changes. This study identifies cornerstone layers in LLMs and underscores their critical role for future research.

Via

Access Paper or Ask Questions

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Sep 13, 2024

Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang(+28 more)

Figure 1 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Figure 2 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Figure 3 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Figure 4 for Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Abstract:We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: \textit{controlled music generation} and \textit{post-production editing}. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For post-production editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio. We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music .

* Seed-Music technical report, 20 pages, 5 figures

Via

Access Paper or Ask Questions

Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

Sep 12, 2024

Teng Yan, Zhendong Ruan, Yaobang Cai, Yu Han, Wenxian Li, Yang Zhang

Figure 1 for Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

Figure 2 for Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

Figure 3 for Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

Figure 4 for Q-value Regularized Decision ConvFormer for Offline Reinforcement Learning

Abstract:As a data-driven paradigm, offline reinforcement learning (Offline RL) has been formulated as sequence modeling, where the Decision Transformer (DT) has demonstrated exceptional capabilities. Unlike previous reinforcement learning methods that fit value functions or compute policy gradients, DT adjusts the autoregressive model based on the expected returns, past states, and actions, using a causally masked Transformer to output the optimal action. However, due to the inconsistency between the sampled returns within a single trajectory and the optimal returns across multiple trajectories, it is challenging to set an expected return to output the optimal action and stitch together suboptimal trajectories. Decision ConvFormer (DC) is easier to understand in the context of modeling RL trajectories within a Markov Decision Process compared to DT. We propose the Q-value Regularized Decision ConvFormer (QDC), which combines the understanding of RL trajectories by DC and incorporates a term that maximizes action values using dynamic programming methods during training. This ensures that the expected returns of the sampled actions are consistent with the optimal returns. QDC achieves excellent performance on the D4RL benchmark, outperforming or approaching the optimal level in all tested environments. It particularly demonstrates outstanding competitiveness in trajectory stitching capability.

Via

Access Paper or Ask Questions