Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen

Cherise

Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis

Feb 19, 2025

Jiahao Gai, Hao, Chen, Zhican Wang, Hongyu Zhou, Wanru Zhao, Nicholas Lane, Hongxiang Fan

Abstract:Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities for automating software development and enhancing programmer productivity. The potential of LLMs in software programming has sparked significant interest in exploring automated hardware generation and automation. Although preliminary endeavors have been made to adopt LLMs in generating hardware description languages (HDLs), several challenges persist in this direction. First, the volume of available HDL training data is substantially smaller compared to that for software programming languages. Second, the pre-trained LLMs, mainly tailored for software code, tend to produce HDL designs that are more error-prone. Third, the generation of HDL requires a significantly higher number of tokens compared to software programming, leading to inefficiencies in cost and energy consumption. To tackle these challenges, this paper explores leveraging LLMs to generate High-Level Synthesis (HLS)-based hardware design. Although code generation for domain-specific programming languages is not new in the literature, we aim to provide experimental results, insights, benchmarks, and evaluation infrastructure to investigate the suitability of HLS over low-level HDLs for LLM-assisted hardware design generation. To achieve this, we first finetune pre-trained models for HLS-based hardware generation, using a collected dataset with text prompts and corresponding reference HLS designs. An LLM-assisted framework is then proposed to automate end-to-end hardware code generation, which also investigates the impact of chain-of-thought and feedback loops promoting techniques on HLS-design generation. Limited by the timeframe of this research, we plan to evaluate more advanced reasoning models in the future.

* Paper accepted by ASP-DAC'25

Via

Access Paper or Ask Questions

An Efficient Outlier Detection Algorithm for Data Streaming

Jan 02, 2025

Rui Hu, Luc, Chen, Yiwei Wang

Abstract:The nature of modern data is increasingly real-time, making outlier detection crucial in any data-related field, such as finance for fraud detection and healthcare for monitoring patient vitals. Traditional outlier detection methods, such as the Local Outlier Factor (LOF) algorithm, struggle with real-time data due to the need for extensive recalculations with each new data point, limiting their application in real-time environments. While the Incremental LOF (ILOF) algorithm has been developed to tackle the challenges of online anomaly detection, it remains computationally expensive when processing large streams of data points, and its detection performance may degrade after a certain threshold of points have streamed in. In this paper, we propose a novel approach to enhance the efficiency of LOF algorithms for online anomaly detection, named the Efficient Incremental LOF (EILOF) algorithm. The EILOF algorithm only computes the LOF scores of new points without altering the LOF scores of existing data points. Although exact LOF scores have not yet been computed for the existing points in the new algorithm, datasets often contain noise, and minor deviations in LOF score calculations do not necessarily degrade detection performance. In fact, such deviations can sometimes enhance outlier detection. We systematically tested this approach on both simulated and real-world datasets, demonstrating that EILOF outperforms ILOF as the volume of streaming data increases across various scenarios. The EILOF algorithm not only significantly reduces computational costs, but also systematically improves detection accuracy when the number of additional points increases compared to the ILOF algorithm.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

Slicing Vision Transformer for Flexible Inference

Dec 06, 2024

Yitian Zhang, Huseyin Coskun, Xu Ma, Huan Wang, Ke Ma, Xi, Chen, Derek Hao Hu, Yun Fu

Abstract:Vision Transformers (ViT) is known for its scalability. In this work, we target to scale down a ViT to fit in an environment with dynamic-changing resource constraints. We observe that smaller ViTs are intrinsically the sub-networks of a larger ViT with different widths. Thus, we propose a general framework, named Scala, to enable a single network to represent multiple smaller ViTs with flexible inference capability, which aligns with the inherent design of ViT to vary from widths. Concretely, Scala activates several subnets during training, introduces Isolated Activation to disentangle the smallest sub-network from other subnets, and leverages Scale Coordination to ensure each sub-network receives simplified, steady, and accurate learning objectives. Comprehensive empirical validations on different tasks demonstrate that with only one-shot training, Scala learns slimmable representation without modifying the original ViT structure and matches the performance of Separate Training. Compared with the prior art, Scala achieves an average improvement of 1.6% on ImageNet-1K with fewer parameters.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

Foundation Model for Composite Materials and Microstructural Analysis

Nov 10, 2024

Ting-Ju Wei, Chuin-Shan, Chen

Abstract:The rapid advancement of machine learning has unlocked numerous opportunities for materials science, particularly in accelerating the design and analysis of materials. However, a significant challenge lies in the scarcity and high cost of obtaining high-quality materials datasets. In other fields, such as natural language processing, foundation models pre-trained on large datasets have achieved exceptional success in transfer learning, effectively leveraging latent features to achieve high performance on tasks with limited data. Despite this progress, the concept of foundation models remains underexplored in materials science. Here, we present a foundation model specifically designed for composite materials. Our model is pre-trained on a dataset of short-fiber composites to learn robust latent features. During transfer learning, the MMAE accurately predicts homogenized stiffness, with an R2 score reaching as high as 0.959 and consistently exceeding 0.91, even when trained on limited data. These findings validate the feasibility and effectiveness of foundation models in composite materials. We anticipate extending this approach to more complex three-dimensional composite materials, polycrystalline materials, and beyond. Moreover, this framework enables high-accuracy predictions even when experimental data are scarce, paving the way for more efficient and cost-effective materials design and analysis.

Via

Access Paper or Ask Questions

Radio-Based Passive Target Tracking by a Mobile Receiver with Unknown Transmitter Position

Nov 07, 2024

Ke Xu, Rui Zhang, He, Chen

Figure 1 for Radio-Based Passive Target Tracking by a Mobile Receiver with Unknown Transmitter Position

Figure 2 for Radio-Based Passive Target Tracking by a Mobile Receiver with Unknown Transmitter Position

Figure 3 for Radio-Based Passive Target Tracking by a Mobile Receiver with Unknown Transmitter Position

Figure 4 for Radio-Based Passive Target Tracking by a Mobile Receiver with Unknown Transmitter Position

Abstract:In this paper, we propose a radio-based passive target tracking algorithm using multipath measurements, including the angle of arrival and relative distance. We focus on a scenario in which a mobile receiver continuously receives radio signals from a transmitter located at an unknown position. The receiver utilizes multipath measurements extracted from the received signal to jointly localize the transmitter and the scatterers over time, with scatterers comprising a moving target and stationary objects that can reflect signals within the environment. We develop a comprehensive probabilistic model for the target tracking problem, incorporating the localization of the transmitter and scatterers, the identification of false alarms and missed detections in the measurements, and the association between scatterers and measurements. We employ a belief propagation approach to compute the posterior distributions of the positions of the scatterers and the transmitter. Additionally, we introduce a particle implementation for the belief propagation method. Simulation results demonstrate that our proposed algorithm outperforms existing benchmark methods in terms of target tracking accuracy.

Via

Access Paper or Ask Questions

Age of Information-Oriented Probabilistic Link Scheduling for Device-to-Device Networks

Oct 26, 2024

Lixin Wang, Qian Wang, He, Chen, Shidong Zhou

Figure 1 for Age of Information-Oriented Probabilistic Link Scheduling for Device-to-Device Networks

Figure 2 for Age of Information-Oriented Probabilistic Link Scheduling for Device-to-Device Networks

Figure 3 for Age of Information-Oriented Probabilistic Link Scheduling for Device-to-Device Networks

Figure 4 for Age of Information-Oriented Probabilistic Link Scheduling for Device-to-Device Networks

Abstract:This paper focuses on optimizing the long-term average age of information (AoI) in device-to-device (D2D) networks through age-aware link scheduling. The problem is naturally formulated as a Markov decision process (MDP). However, finding the optimal policy for the formulated MDP in its original form is challenging due to the intertwined AoI dynamics of all D2D links. To address this, we propose an age-aware stationary randomized policy that determines the probability of scheduling each link in each time slot based on the AoI of all links and the statistical channel state information among all transceivers. By employing the Lyapunov optimization framework, our policy aims to minimize the Lyapunov drift in every time slot. Nonetheless, this per-slot minimization problem is nonconvex due to cross-link interference in D2D networks, posing significant challenges for real-time decision-making. After analyzing the permutation equivariance property of the optimal solutions to the per-slot problem, we apply a message passing neural network (MPNN), a type of graph neural network that also exhibits permutation equivariance, to optimize the per-slot problem in an unsupervised learning manner. Simulation results demonstrate the superior performance of the proposed age-aware stationary randomized policy over baselines and validate the scalability of our method.

* 8 pages, 7 figures, accepted by IEEE WiOpt24

Via

Access Paper or Ask Questions

Robust Guided Diffusion for Offline Black-Box Optimization

Oct 01, 2024

Can, Chen, Christopher Beckham, Zixuan Liu, Xue Liu, Christopher Pal

Figure 1 for Robust Guided Diffusion for Offline Black-Box Optimization

Figure 2 for Robust Guided Diffusion for Offline Black-Box Optimization

Figure 3 for Robust Guided Diffusion for Offline Black-Box Optimization

Figure 4 for Robust Guided Diffusion for Offline Black-Box Optimization

Abstract:Offline black-box optimization aims to maximize a black-box function using an offline dataset of designs and their measured properties. Two main approaches have emerged: the forward approach, which learns a mapping from input to its value, thereby acting as a proxy to guide optimization, and the inverse approach, which learns a mapping from value to input for conditional generation. (a) Although proxy-free~(classifier-free) diffusion shows promise in robustly modeling the inverse mapping, it lacks explicit guidance from proxies, essential for generating high-performance samples beyond the training distribution. Therefore, we propose \textit{proxy-enhanced sampling} which utilizes the explicit guidance from a trained proxy to bolster proxy-free diffusion with enhanced sampling control. (b) Yet, the trained proxy is susceptible to out-of-distribution issues. To address this, we devise the module \textit{diffusion-based proxy refinement}, which seamlessly integrates insights from proxy-free diffusion back into the proxy for refinement. To sum up, we propose \textit{\textbf{R}obust \textbf{G}uided \textbf{D}iffusion for Offline Black-box Optimization}~(\textbf{RGD}), combining the advantages of proxy~(explicit guidance) and proxy-free diffusion~(robustness) for effective conditional generation. RGD achieves state-of-the-art results on various design-bench tasks, underscoring its efficacy. Our code is at https://anonymous.4open.science/r/RGD-27A5/README.md.

* 21 pages

Via

Access Paper or Ask Questions

Bilingual Adaptation of Monolingual Foundation Models

Jul 13, 2024

Gurpreet Gosal, Yishi Xu, Gokul Ramakrishnan, Rituraj Joshi, Avraham Sheinin, Zhiming, Chen, Biswajit Mishra, Natalia Vassilieva, Joel Hestness(+10 more)

Figure 1 for Bilingual Adaptation of Monolingual Foundation Models

Figure 2 for Bilingual Adaptation of Monolingual Foundation Models

Figure 3 for Bilingual Adaptation of Monolingual Foundation Models

Figure 4 for Bilingual Adaptation of Monolingual Foundation Models

Abstract:We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pretraining on a bilingual corpus. By continually pretraining on a mix of Arabic and English corpora, the model retains its proficiency in English while acquiring capabilities in Arabic. Our approach results in significant improvements in Arabic and slight enhancements in English, demonstrating cost-effective cross-lingual transfer. We also perform extensive ablations on embedding initialization techniques, data mix ratios, and learning rates and release a detailed training recipe.

Via

Access Paper or Ask Questions

Physics-Informed Geometric Operators to Support Surrogate, Dimension Reduction and Generative Models for Engineering Design

Jul 10, 2024

Shahroz Khan, Zahid Masood, Muhammad Usama, Konstantinos Kostas, Panagiotis Kaklis, Wei, Chen

Abstract:In this work, we propose a set of physics-informed geometric operators (GOs) to enrich the geometric data provided for training surrogate/discriminative models, dimension reduction, and generative models, typically employed for performance prediction, dimension reduction, and creating data-driven parameterisations, respectively. However, as both the input and output streams of these models consist of low-level shape representations, they often fail to capture shape characteristics essential for performance analyses. Therefore, the proposed GOs exploit the differential and integral properties of shapes--accessed through Fourier descriptors, curvature integrals, geometric moments, and their invariants--to infuse high-level intrinsic geometric information and physics into the feature vector used for training, even when employing simple model architectures or low-level parametric descriptions. We showed that for surrogate modelling, along with the inclusion of the notion of physics, GOs enact regularisation to reduce over-fitting and enhance generalisation to new, unseen designs. Furthermore, through extensive experimentation, we demonstrate that for dimension reduction and generative models, incorporating the proposed GOs enriches the training data with compact global and local geometric features. This significantly enhances the quality of the resulting latent space, thereby facilitating the generation of valid and diverse designs. Lastly, we also show that GOs can enable learning parametric sensitivities to a great extent. Consequently, these enhancements accelerate the convergence rate of shape optimisers towards optimal solutions.

Via

Access Paper or Ask Questions

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

May 28, 2024

Hao, Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan

Figure 1 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Figure 2 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Figure 3 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Figure 4 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Abstract:The auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. While recent research has investigated various speculative decoding techniques for multi-token generation, these efforts have primarily focused on improving processing speed such as throughput. Crucially, they often neglect other metrics essential for real-life deployments, such as memory consumption and training cost. To overcome these limitations, we propose a novel parallel prompt decoding that requires only $0.0002$% trainable parameters, enabling efficient training on a single A100-40GB GPU in just 16 hours. Inspired by the human natural language generation process, $PPD$ approximates outputs generated at future timesteps in parallel by using multiple prompt tokens. This approach partially recovers the missing conditional dependency information necessary for multi-token generation, resulting in up to a 28% higher acceptance rate for long-range predictions. Furthermore, we present a hardware-aware dynamic sparse tree technique that adaptively optimizes this decoding scheme to fully leverage the computational capacities on different GPUs. Through extensive experiments across LLMs ranging from MobileLlama to Vicuna-13B on a wide range of benchmarks, our approach demonstrates up to 2.49$\times$ speedup and maintains a minimal runtime memory overhead of just $0.0004$%. More importantly, our parallel prompt decoding can serve as an orthogonal optimization for synergistic integration with existing speculative decoding, showing up to $1.22\times$ further speed improvement. Our code is available at https://github.com/hmarkc/parallel-prompt-decoding.

* The code for this implementation is available at https://github.com/hmarkc/parallel-prompt-decoding

Via

Access Paper or Ask Questions