Abstract:Text-to-image (T2I) generation models often struggle with multi-instance synthesis (MIS), where they must accurately depict multiple distinct instances in a single image based on complex prompts detailing individual features. Traditional MIS control methods for UNet architectures like SD v1.5/SDXL fail to adapt to DiT-based models like FLUX and SD v3.5, which rely on integrated attention between image and text tokens rather than text-image cross-attention. To enhance MIS in DiT, we first analyze the mixed attention mechanism in DiT. Our token-wise and layer-wise analysis of attention maps reveals a hierarchical response structure: instance tokens dominate early layers, background tokens in middle layers, and attribute tokens in later layers. Building on this observation, we propose a training-free approach for enhancing MIS in DiT-based models with hierarchical and step-layer-wise attention specialty tuning (AST). AST amplifies key regions while suppressing irrelevant areas in distinct attention maps across layers and steps, guided by the hierarchical structure. This optimizes multimodal interactions by hierarchically decoupling the complex prompts with instance-based sketches. We evaluate our approach using upgraded sketch-based layouts for the T2I-CompBench and customized complex scenes. Both quantitative and qualitative results confirm our method enhances complex layout generation, ensuring precise instance placement and attribute representation in MIS.
Abstract:This paper introduces Fusion PEFT Energy LLM (FPE-LLM), a large language model (LLM) fine-tuned for energy system forecasting using a combination of Prefix and Lora Parameter-Efficient Fine-Tuning (PEFT) methods. FPE-LLM addresses three key challenges in the energy system and LLM fields: 1. Enhancing few-shot learning for handling extreme environmental conditions. FPE-LLM can leverage both textual and time-series data to achieve accurate predictions in few-shot contexts. 2. Reducing dependence on expert input to improve efficiency. FPE-LLM can provide guidance and results on related problems, acting like an expert system. Even non-experts can use FPE-LLM to complete all tasks related to forecasting and its associated tasks. 3. Mitigating hallucination risks through standardized fine-tuning. We validated this through multi-task learning and the self-reasoning characteristics of LLMs. Our research opens the door to fully realizing the intelligent potential of FPE-LLM in the energy forecasting field. With the injection of more knowledge and data, FPE-LLM is expected to replace a significant amount of manual work and contribute to the stability and efficiency of energy forecasting.
Abstract:The 2019-20 Australia bushfire incurred numerous economic losses and significantly affected the operations of power systems. A power station or transmission line can be significantly affected due to bushfires, leading to an increase in operational costs. We study a fundamental but challenging problem of planning the optimal power flow (OPF) for power systems subject to bushfires. Considering the stochastic nature of bushfire spread, we develop a model to capture such dynamics based on Moore's neighborhood model. Under a periodic inspection scheme that reveals the in-situ bushfire status, we propose an online optimization modeling framework that sequentially plans the power flows in the electricity network. Our framework assumes that the spread of bushfires is non-stationary over time, and the spread and containment probabilities are unknown. To meet these challenges, we develop a contextual online learning algorithm that treats the in-situ geographical information of the bushfire as a 'spatial context'. The online learning algorithm learns the unknown probabilities sequentially based on the observed data and then makes the OPF decision accordingly. The sequential OPF decisions aim to minimize the regret function, which is defined as the cumulative loss against the clairvoyant strategy that knows the true model parameters. We provide a theoretical guarantee of our algorithm by deriving a bound on the regret function, which outperforms the regret bound achieved by other benchmark algorithms. Our model assumptions are verified by the real bushfire data from NSW, Australia, and we apply our model to two power systems to illustrate its applicability.
Abstract:Ultrasonic methods have great potential applications to detect and characterize defects in multi-layered bonded composites. However, it remains challenging to quantitatively reconstruct defects, such as disbonds and kissing bonds, that influence the integrity of adhesive bonds and seriously reduce the strength of assemblies. In this work, an ultrasonic method based on the supervised fully convolutional network (FCN) is proposed to quantitatively reconstruct defects hidden in multi-layered bonded composites. In the training process of this method, an FCN establishes a non-linear mapping from measured ultrasonic data to the corresponding velocity models of multi-layered bonded composites. In the predicting process, the trained network obtained from the training process is used to directly reconstruct the velocity models from the new measured ultrasonic data of adhesively bonded composites. The presented FCN-based inversion method can automatically extract useful features in multi-layered composites. Although this method is computationally expensive in the training process, the prediction itself in the online phase takes only seconds. The numerical results show that the FCN-based ultrasonic inversion method is capable to accurately reconstruct ultrasonic velocity models of the high contrast defects, which has great potential for online detection of adhesively bonded composites.