Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Cao

Katie

HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design

Dec 06, 2024

Jinwei Tang, Jiayin Qin, Kiran Thorat, Chen Zhu-Tian, Yu Cao, Yang, Zhao, Caiwen Ding

Figure 1 for HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design

Figure 2 for HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design

Figure 3 for HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design

Figure 4 for HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design

Abstract:With Large Language Models (LLMs) recently demonstrating impressive proficiency in code generation, it is promising to extend their abilities to Hardware Description Language (HDL). However, LLMs tend to generate single HDL code blocks rather than hierarchical structures for hardware designs, leading to hallucinations, particularly in complex designs like Domain-Specific Accelerators (DSAs). To address this, we propose HiVeGen, a hierarchical LLM-based Verilog generation framework that decomposes generation tasks into LLM-manageable hierarchical submodules. HiVeGen further harnesses the advantages of such hierarchical structures by integrating automatic Design Space Exploration (DSE) into hierarchy-aware prompt generation, introducing weight-based retrieval to enhance code reuse, and enabling real-time human-computer interaction to lower error-correction cost, significantly improving the quality of generated designs.

Via

Access Paper or Ask Questions

Knowledge-Based Deep Learning for Time-Efficient Inverse Dynamics

Dec 06, 2024

Shuhao Ma, Yu Cao, Ian D. Robertson, Chaoyang Shi, Jindong Liu, Zhi-Qiang Zhang

Figure 1 for Knowledge-Based Deep Learning for Time-Efficient Inverse Dynamics

Figure 2 for Knowledge-Based Deep Learning for Time-Efficient Inverse Dynamics

Figure 3 for Knowledge-Based Deep Learning for Time-Efficient Inverse Dynamics

Figure 4 for Knowledge-Based Deep Learning for Time-Efficient Inverse Dynamics

Abstract:Accurate understanding of muscle activation and muscle forces plays an essential role in neuro-rehabilitation and musculoskeletal disorder treatments. Computational musculoskeletal modeling has been widely used as a powerful non-invasive tool to estimate them through inverse dynamics using static optimization, but the inherent computational complexity results in time-consuming analysis. In this paper, we propose a knowledge-based deep learning framework for time-efficient inverse dynamic analysis, which can predict muscle activation and muscle forces from joint kinematic data directly while not requiring any label information during model training. The Bidirectional Gated Recurrent Unit (BiGRU) neural network is selected as the backbone of our model due to its proficient handling of time-series data. Prior physical knowledge from forward dynamics and pre-selected inverse dynamics based physiological criteria are integrated into the loss function to guide the training of neural networks. Experimental validations on two datasets, including one benchmark upper limb movement dataset and one self-collected lower limb movement dataset from six healthy subjects, are performed. The experimental results have shown that the selected BiGRU architecture outperforms other neural network models when trained using our specifically designed loss function, which illustrates the effectiveness and robustness of the proposed framework.

* 10 pages, 8 figures, Journal paper

Via

Access Paper or Ask Questions

PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

Oct 16, 2024

Guanghao Li, Yu Cao, Qi Chen, Yifan Yang, Jian Pu

Figure 1 for PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

Figure 2 for PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

Figure 3 for PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

Figure 4 for PAPL-SLAM: Principal Axis-Anchored Monocular Point-Line SLAM

Abstract:In point-line SLAM systems, the utilization of line structural information and the optimization of lines are two significant problems. The former is usually addressed through structural regularities, while the latter typically involves using minimal parameter representations of lines in optimization. However, separating these two steps leads to the loss of constraint information to each other. We anchor lines with similar directions to a principal axis and optimize them with $n+2$ parameters for $n$ lines, solving both problems together. Our method considers scene structural information, which can be easily extended to different world hypotheses while significantly reducing the number of line parameters to be optimized, enabling rapid and accurate mapping and tracking. To further enhance the system's robustness and avoid mismatch, we have modeled the line-axis probabilistic data association and provided the algorithm for axis creation, updating, and optimization. Additionally, considering that most real-world scenes conform to the Atlanta World hypothesis, we provide a structural line detection strategy based on vertical priors and vanishing points. Experimental results and ablation studies on various indoor and outdoor datasets demonstrate the effectiveness of our system.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

Computer-aided Colorization State-of-the-science: A Survey

Oct 03, 2024

Yu Cao, Xin Duan, Xiangqiao Meng, P. Y. Mok, Ping Li, Tong-Yee Lee

Abstract:This paper reviews published research in the field of computer-aided colorization technology. We argue that the colorization task originates from computer graphics, prospers by introducing computer vision, and tends to the fusion of vision and graphics, so we put forward our taxonomy and organize the whole paper chronologically. We extend the existing reconstruction-based colorization evaluation techniques, considering that aesthetic assessment of colored images should be introduced to ensure that colorization satisfies human visual-related requirements and emotions more closely. We perform the colorization aesthetic assessment on seven representative unconditional colorization models and discuss the difference between our assessment and the existing reconstruction-based metrics. Finally, this paper identifies unresolved issues and proposes fruitful areas for future research and development. Access to the project associated with this survey can be obtained at https://github.com/DanielCho-HK/Colorization.

Via

Access Paper or Ask Questions

Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

Sep 04, 2024

Haozhuo Zhang, Bin Zhu, Yu Cao, Yanbin Hao

Figure 1 for Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

Figure 2 for Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

Figure 3 for Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

Figure 4 for Hand1000: Generating Realistic Hands from Text with Only 1,000 Images

Abstract:Text-to-image generation models have achieved remarkable advancements in recent years, aiming to produce realistic images from textual descriptions. However, these models often struggle with generating anatomically accurate representations of human hands. The resulting images frequently exhibit issues such as incorrect numbers of fingers, unnatural twisting or interlacing of fingers, or blurred and indistinct hands. These issues stem from the inherent complexity of hand structures and the difficulty in aligning textual descriptions with precise visual depictions of hands. To address these challenges, we propose a novel approach named Hand1000 that enables the generation of realistic hand images with target gesture using only 1,000 training samples. The training of Hand1000 is divided into three stages with the first stage aiming to enhance the model's understanding of hand anatomy by using a pre-trained hand gesture recognition model to extract gesture representation. The second stage further optimizes text embedding by incorporating the extracted hand gesture representation, to improve alignment between the textual descriptions and the generated hand images. The third stage utilizes the optimized embedding to fine-tune the Stable Diffusion model to generate realistic hand images. In addition, we construct the first publicly available dataset specifically designed for text-to-hand image generation. Based on the existing hand gesture recognition dataset, we adopt advanced image captioning models and LLaMA3 to generate high-quality textual descriptions enriched with detailed gesture information. Extensive experiments demonstrate that Hand1000 significantly outperforms existing models in producing anatomically correct hand images while faithfully representing other details in the text, such as faces, clothing, and colors.

* Project page https://haozhuo-zhang.github.io/Hand1000-project-page/

Via

Access Paper or Ask Questions

Adaptify: A Refined Adaptation Scheme for Frame Classification in Atrophic Gastritis Videos

Aug 17, 2024

Zinan Xiong, Shuijiao Chen, Yizhe Zhang, Yu Cao, Benyuan Liu, Xiaowei Liu

Abstract:Atrophic gastritis is a significant risk factor for developing gastric cancer. The incorporation of machine learning algorithms can efficiently elevate the possibility of accurately detecting atrophic gastritis. Nevertheless, when the trained model is applied in real-life circumstances, its output is often not consistently reliable. In this paper, we propose Adaptify, an adaptation scheme in which the model assimilates knowledge from its own classification decisions. Our proposed approach includes keeping the primary model constant, while simultaneously running and updating the auxiliary model. By integrating the knowledge gleaned by the auxiliary model into the primary model and merging their outputs, we have observed a notable improvement in output stability and consistency compared to relying solely on either the main model or the auxiliary model.

* ISBI 2024 Proceeding

Via

Access Paper or Ask Questions

Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion

Jul 09, 2024

Yu Cao, Shaogang Gong

Abstract:In the field of Few-Shot Image Generation (FSIG) using Deep Generative Models (DGMs), accurately estimating the distribution of target domain with minimal samples poses a significant challenge. This requires a method that can both capture the broad diversity and the true characteristics of the target domain distribution. We present Conditional Relaxing Diffusion Inversion (CRDI), an innovative `training-free' approach designed to enhance distribution diversity in synthetic image generation. Distinct from conventional methods, CRDI does not rely on fine-tuning based on only a few samples. Instead, it focuses on reconstructing each target image instance and expanding diversity through few-shot learning. The approach initiates by identifying a Sample-wise Guidance Embedding (SGE) for the diffusion model, which serves a purpose analogous to the explicit latent codes in certain Generative Adversarial Network (GAN) models. Subsequently, the method involves a scheduler that progressively introduces perturbations to the SGE, thereby augmenting diversity. Comprehensive experiments demonstrates that our method surpasses GAN-based reconstruction techniques and equals state-of-the-art (SOTA) FSIG methods in performance. Additionally, it effectively mitigates overfitting and catastrophic forgetting, common drawbacks of fine-tuning approaches.

Via

Access Paper or Ask Questions

Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

May 29, 2024

Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras

Figure 1 for Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

Figure 2 for Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

Figure 3 for Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

Figure 4 for Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer

Abstract:Current facial expression recognition (FER) models are often designed in a supervised learning manner thus are constrained by the lack of large-scale facial expression images with high-quality annotations. Consequently, these models often fail to generalize well, performing poorly on unseen images in training. Vision-language-based zero-shot models demonstrate a promising potential for addressing such challenges. However, these models lack task-specific knowledge therefore are not optimized for the nuances of recognizing facial expressions. To bridge this gap, this work proposes a novel method, Exp-CLIP, to enhance zero-shot FER by transferring the task knowledge from large language models (LLMs). Specifically, based on the pre-trained vision-language encoders, we incorporate a projection head designed to map the initial joint vision-language space into a space that captures representations of facial actions. To train this projection head for subsequent zero-shot predictions, we propose to align the projected visual representations with task-specific semantic meanings derived from the LLM encoder, and the text instruction-based strategy is employed to customize the LLM knowledge. Given unlabelled facial data and efficient training of the projection head, Exp-CLIP achieves superior zero-shot results to the CLIP models and several other large vision-language models (LVLMs) on seven in-the-wild FER datasets. The code and pre-trained models are available at \url{https://github.com/zengqunzhao/Exp-CLIP}.

Via

Access Paper or Ask Questions

Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

May 29, 2024

Zhiqiang Cai, Yu Cao, Yuanfei Huang, Xiang Zhou

Figure 1 for Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

Figure 2 for Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

Figure 3 for Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

Figure 4 for Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

Abstract:Sampling invariant distributions from an Ito diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in both biased and correlated samples. Current deep learning-based method solves the stationary Fokker--Planck equation to determine the invariant probability density function in form of deep neural networks, but they generally do not directly address the problem of sampling from the computed density function. In this work, we introduce a framework that employs a weak generative sampler (WGS) to directly generate independent and identically distributed (iid) samples induced by a transformation map derived from the stationary Fokker--Planck equation. Our proposed loss function is based on the weak form of the Fokker--Planck equation, integrating normalizing flows to characterize the invariant distribution and facilitate sample generation from the base distribution. Our randomized test function circumvents the need for mini-max optimization in the traditional weak formulation. Distinct from conventional generative models, our method neither necessitates the computationally intensive calculation of the Jacobian determinant nor the invertibility of the transformation map. A crucial component of our framework is the adaptively chosen family of test functions in the form of Gaussian kernel functions with centres selected from the generated data samples. Experimental results on several benchmark examples demonstrate the effectiveness of our method, which offers both low computational costs and excellent capability in exploring multiple metastable states.

* 24 pages,10 figures

Via

Access Paper or Ask Questions

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

May 20, 2024

Jian Hu, Xibin Wu, Weixun Wang, Xianyu, Dehao Zhang, Yu Cao

Figure 1 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Figure 2 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Figure 3 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Figure 4 for OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Abstract:As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at https://github.com/OpenLLMAI/OpenRLHF.

Via

Access Paper or Ask Questions