Abstract:In the global drive toward carbon neutrality, deeply coordinated smart energy systems underpin industrial transformation. However, the interdisciplinary, fragmented, and fast-evolving expertise in this domain prevents general-purpose LLMs, which lack domain knowledge and physical-constraint awareness, from delivering precise engineering-aligned inference and generation. To address these challenges, we introduce Helios, a large language model tailored to the smart energy domain, together with a comprehensive suite of resources to advance LLM research in this field. Specifically, we develop Enersys, a multi-agent collaborative framework for end-to-end dataset construction, through which we produce: (1) a smart energy knowledge base, EnerBase, to enrich the model's foundational expertise; (2) an instruction fine-tuning dataset, EnerInstruct, to strengthen performance on domain-specific downstream tasks; and (3) an RLHF dataset, EnerReinforce, to align the model with human preferences and industry standards. Leveraging these resources, Helios undergoes large-scale pretraining, SFT, and RLHF. We also release EnerBench, a benchmark for evaluating LLMs in smart energy scenarios, and demonstrate that our approach significantly enhances domain knowledge mastery, task execution accuracy, and alignment with human preferences.
Abstract:The rapid growth of artificial intelligence is exponentially escalating computational demand, inflating data center energy use and carbon emissions, and spurring rapid deployment of green data centers to relieve resource and environmental stress. Achieving sub-minute orchestration of renewables, storage, and loads, while minimizing PUE and lifecycle carbon intensity, hinges on accurate load forecasting. However, existing methods struggle to address small-sample scenarios caused by cold start, load distortion, multi-source data fragmentation, and distribution shifts in green data centers. We introduce HyperLoad, a cross-modality framework that exploits pre-trained large language models (LLMs) to overcome data scarcity. In the Cross-Modality Knowledge Alignment phase, textual priors and time-series data are mapped to a common latent space, maximizing the utility of prior knowledge. In the Multi-Scale Feature Modeling phase, domain-aligned priors are injected through adaptive prefix-tuning, enabling rapid scenario adaptation, while an Enhanced Global Interaction Attention mechanism captures cross-device temporal dependencies. The public DCData dataset is released for benchmarking. Under both data sufficient and data scarce settings, HyperLoad consistently surpasses state-of-the-art (SOTA) baselines, demonstrating its practicality for sustainable green data center management.




Abstract:As the primary standard protocol for modern cars, the Controller Area Network (CAN) is a critical research target for automotive cybersecurity threats and autonomous applications. As the decoding specification of CAN is a proprietary black-box maintained by Original Equipment Manufacturers (OEMs), conducting related research and industry developments can be challenging without a comprehensive understanding of the meaning of CAN messages. In this paper, we propose a fully automated reverse-engineering system, named ByCAN, to reverse engineer CAN messages. ByCAN outperforms existing research by introducing byte-level clusters and integrating multiple features at both byte and bit levels. ByCAN employs the clustering and template matching algorithms to automatically decode the specifications of CAN frames without the need for prior knowledge. Experimental results demonstrate that ByCAN achieves high accuracy in slicing and labeling performance, i.e., the identification of CAN signal boundaries and labels. In the experiments, ByCAN achieves slicing accuracy of 80.21%, slicing coverage of 95.21%, and labeling accuracy of 68.72% for general labels when analyzing the real-world CAN frames.