Abstract:The increasing size and complexity of Large Language Models (LLMs) pose challenges for their deployment on personal computers and mobile devices. Aggressive post-training model compression is necessary to reduce the models' size, but it often results in significant accuracy loss. To address this challenge, we propose a novel network pruning technology that utilizes over 0.7 sparsity and less than 8 bits of quantization. Our approach enables the compression of prevailing LLMs within a couple of hours while maintaining a relatively small accuracy loss. In experimental evaluations, our method demonstrates effectiveness and potential for practical deployment. By making LLMs available on domestic devices, our work can facilitate a new era of natural language processing applications with wide-ranging impacts.
Abstract:Cherenkov imaging enables real-time visualization of megavoltage X-ray or electron beam delivery to the patient during Radiation Therapy (RT). Bio-morphological features, such as vasculature, seen in these images are patient-specific signatures that can be used for verification of positioning and motion management that are essential to precise RT treatment. However until now, no concerted analysis of this biological feature-based tracking was utilized because of the slow speed and accuracy of conventional image processing for feature segmentation. This study demonstrated the first deep learning framework for such an application, achieving video frame rate processing. To address the challenge of limited annotation of these features in Cherenkov images, a transfer learning strategy was applied. A fundus photography dataset including 20,529 patch retina images with ground-truth vessel annotation was used to pre-train a ResNet segmentation framework. Subsequently, a small Cherenkov dataset (1,483 images from 212 treatment fractions of 19 breast cancer patients) with known annotated vasculature masks was used to fine-tune the model for accurate segmentation prediction. This deep learning framework achieved consistent and rapid segmentation of Cherenkov-imaged bio-morphological features on another 19 patients, including subcutaneous veins, scars, and pigmented skin. Average segmentation by the model achieved Dice score of 0.85 and required less than 0.7 milliseconds processing time per instance. The model demonstrated outstanding consistency against input image variances and speed compared to conventional manual segmentation methods, laying the foundation for online segmentation in real-time monitoring in a prospective setting.
Abstract:Accurate patient positioning is critical for precise radiotherapy dose delivery, as positioning errors can significantly affect treatment outcomes. This study introduces a novel method for tracking loco-regional tissue deformation through Cherenkov image analysis during fractionated breast cancer radiotherapy. The primary goal was to develop and test an algorithm for Cherenkov-based regional position accuracy quantification, specifically for loco-regional deformations, which lack ideal quantification methods in radiotherapy. Blood vessel detection and segmentation were developed in Cherenkov images using a tissue phantom with incremental movements, and later applied to images from fractionated whole breast radiotherapy in human patients (n=10). A combined rigid and non-rigid registration technique was used to detect inter- and intra-fractional positioning variations. This approach quantified positioning variations in two parts: a global shift from rigid registration and a two-dimensional variation map of loco-regional deformation from non-rigid registration. The methodology was validated using an anthropomorphic chest phantom experiment, where known treatment couch translations and respiratory motion were simulated to assess inter- and intra-fractional uncertainties, yielding an average accuracy of 0.83 mm for couch translations up to 20 mm. Analysis of clinical Cherenkov data from ten breast cancer patients showed an inter-fraction setup variation of 3.7 plus minus 2.4 mm relative to the first fraction and loco-regional deformations (95th percentile) of up to 3.3 plus minus 1.9 mm. This study presents a Cherenkov-based approach to quantify global and local positioning variations, demonstrating feasibility in addressing loco-regional deformations that conventional imaging techniques fail to capture.
Abstract:Multi-exit network is a promising architecture for efficient model inference by sharing backbone networks and weights among multiple exits. However, the gradient conflict of the shared weights results in sub-optimal accuracy. This paper introduces Deep Feature Surgery (\methodname), which consists of feature partitioning and feature referencing approaches to resolve gradient conflict issues during the training of multi-exit networks. The feature partitioning separates shared features along the depth axis among all exits to alleviate gradient conflict while simultaneously promoting joint optimization for each exit. Subsequently, feature referencing enhances multi-scale features for distinct exits across varying depths to improve the model accuracy. Furthermore, \methodname~reduces the training operations with the reduced complexity of backpropagation. Experimental results on Cifar100 and ImageNet datasets exhibit that \methodname~provides up to a \textbf{50.00\%} reduction in training time and attains up to a \textbf{6.94\%} enhancement in accuracy when contrasted with baseline methods across diverse models and tasks. Budgeted batch classification evaluation on MSDNet demonstrates that DFS uses about $\mathbf{2}\boldsymbol{\times}$ fewer average FLOPs per image to achieve the same classification accuracy as baseline methods on Cifar100.
Abstract:Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a Low-Rank Adaptations (LoRA) fusion matrix of multiple LoRA to merge various concepts into a single image. However, we identify this straightforward method faces two major challenges: 1) concept confusion, which occurs when the model cannot preserve distinct individual characteristics, and 2) concept vanishing, where the model fails to generate the intended subjects. To address these issues, we introduce LoRA-Composer, a training-free framework designed for seamlessly integrating multiple LoRAs, thereby enhancing the harmony among different concepts within generated images. LoRA-Composer addresses concept vanishing through Concept Injection Constraints, enhancing concept visibility via an expanded cross-attention mechanism. To combat concept confusion, Concept Isolation Constraints are introduced, refining the self-attention computation. Furthermore, Latent Re-initialization is proposed to effectively stimulate concept-specific latent within designated regions. Our extensive testing showcases a notable enhancement in LoRA-Composer's performance compared to standard baselines, especially when eliminating the image-based conditions like canny edge or pose estimations. Code is released at https://github.com/Young98CN/LoRA\_Composer.
Abstract:Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we introduce a new simple yet practical task setting: local control. It focuses on controlling specific local areas according to user-defined image conditions, where the rest areas are only conditioned by the original text prompt. This manner allows the users to flexibly control the image generation in a fine-grained way. However, it is non-trivial to achieve this goal. The naive manner of directly adding local conditions may lead to the local control dominance problem. To mitigate this problem, we propose a training-free method that leverages the updates of noised latents and parameters in the cross-attention map during the denosing process to promote concept generation in non-control areas. Moreover, we use feature mask constraints to mitigate the degradation of synthesized image quality caused by information differences inside and outside the local control area. Extensive experiments demonstrate that our method can synthesize high-quality images to the prompt under local control conditions. Code is available at https://github.com/YibooZhao/Local-Control.
Abstract:Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, such as the popular float32. In this study, we show that when aiming for limited precision, existing approximation methods can be outperformed by programs automatically discovered from scratch by a simple evolutionary algorithm. In particular, over real numbers, our method can approximate the exponential function reaching orders of magnitude more precision for a given number of operations when compared to previous approaches. More practically, over float32 numbers and constrained to less than 1 ULP of error, the same method attains a speedup over baselines by generating code that triggers better XLA/LLVM compilation paths. In other words, in both cases, evolution searched a vast space of possible programs, without knowledge of mathematics, to discover previously unknown optimized approximations to high precision, for the first time. We also give evidence that these results extend beyond the exponential. The ubiquity of transcendental functions suggests that our method has the potential to reduce the cost of scientific computing applications.
Abstract:Manually grading structural changes with the modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) on spinal X-ray imaging is costly and time-consuming due to bone shape complexity and image quality variations. In this study, we address this challenge by prototyping a 2-step auto-grading pipeline, called VertXGradeNet, to automatically predict mSASSS scores for the cervical and lumbar vertebral units (VUs) in X-ray spinal imaging. The VertXGradeNet utilizes VUs generated by our previously developed VU extraction pipeline (VertXNet) as input and predicts mSASSS based on those VUs. VertXGradeNet was evaluated on an in-house dataset of lateral cervical and lumbar X-ray images for axial spondylarthritis patients. Our results show that VertXGradeNet can predict the mSASSS score for each VU when the data is limited in quantity and imbalanced. Overall, it can achieve a balanced accuracy of 0.56 and 0.51 for 4 different mSASSS scores (i.e., a score of 0, 1, 2, 3) on two test datasets. The accuracy of the presented method shows the potential to streamline the spinal radiograph readings and therefore reduce the cost of future clinical trials.
Abstract:The metaverse, which is at the stage of innovation and exploration, faces the dilemma of data collection and the problem of private data leakage in the process of development. This can seriously hinder the widespread deployment of the metaverse. Fortunately, federated learning (FL) is a solution to the above problems. FL is a distributed machine learning paradigm with privacy-preserving features designed for a large number of edge devices. Federated learning for metaverse (FL4M) will be a powerful tool. Because FL allows edge devices to participate in training tasks locally using their own data, computational power, and model-building capabilities. Applying FL to the metaverse not only protects the data privacy of participants but also reduces the need for high computing power and high memory on servers. Until now, there have been many studies about FL and the metaverse, respectively. In this paper, we review some of the early advances of FL4M, which will be a research direction with unlimited development potential. We first introduce the concepts of metaverse and FL, respectively. Besides, we discuss the convergence of key metaverse technologies and FL in detail, such as big data, communication technology, the Internet of Things, edge computing, blockchain, and extended reality. Finally, we discuss some key challenges and promising directions of FL4M in detail. In summary, we hope that our up-to-date brief survey can help people better understand FL4M and build a fair, open, and secure metaverse.
Abstract:Reliable vertebrae annotations are key to perform analysis of spinal X-ray images. However, obtaining annotation of vertebrae from those images is usually carried out manually due to its complexity (i.e. small structures with varying shape), making it a costly and tedious process. To accelerate this process, we proposed an ensemble pipeline, VertXNet, that combines two state-of-the-art (SOTA) segmentation models (respectively U-Net and Mask R-CNN) to automatically segment and label vertebrae in X-ray spinal images. Moreover, VertXNet introduces a rule-based approach that allows to robustly infer vertebrae labels (by locating the 'reference' vertebrae which are easier to segment than others) for a given spinal X-ray image. We evaluated the proposed pipeline on three spinal X-ray datasets (two internal and one publicly available), and compared against vertebrae annotated by radiologists. Our experimental results have shown that the proposed pipeline outperformed two SOTA segmentation models on our test dataset (MEASURE 1) with a mean Dice of 0.90, vs. a mean Dice of 0.73 for Mask R-CNN and 0.72 for U-Net. To further evaluate the generalization ability of VertXNet, the pre-trained pipeline was directly tested on two additional datasets (PREVENT and NHANES II) and consistent performance was observed with a mean Dice of 0.89 and 0.88, respectively. Overall, VertXNet demonstrated significantly improved performance for vertebra segmentation and labeling for spinal X-ray imaging, and evaluation on both in-house clinical trial data and publicly available data further proved its generalization.