



Abstract:The potential of Machine Learning Control (MLC) in HVAC systems is hindered by its opaque nature and inference mechanisms, which is challenging for users and modelers to fully comprehend, ultimately leading to a lack of trust in MLC-based decision-making. To address this challenge, this paper investigates and explores Interpretable Machine Learning (IML), a branch of Machine Learning (ML) that enhances transparency and understanding of models and their inferences, to improve the credibility of MLC and its industrial application in HVAC systems. Specifically, we developed an innovative framework that combines the principles of Shapley values and the in-context learning feature of Large Language Models (LLMs). While the Shapley values are instrumental in dissecting the contributions of various features in ML models, LLM provides an in-depth understanding of rule-based parts in MLC; combining them, LLM further packages these insights into a coherent, human-understandable narrative. The paper presents a case study to demonstrate the feasibility of the developed IML framework for model predictive control-based precooling under demand response events in a virtual testbed. The results indicate that the developed framework generates and explains the control signals in accordance with the rule-based rationale.




Abstract:The rapid progression in artificial intelligence has facilitated the emergence of large language models like ChatGPT, offering potential applications extending into specialized engineering modeling, especially physics-based building energy modeling. This paper investigates the innovative integration of large language models with building energy modeling software, focusing specifically on the fusion of ChatGPT with EnergyPlus. A literature review is first conducted to reveal a growing trend of incorporating of large language models in engineering modeling, albeit limited research on their application in building energy modeling. We underscore the potential of large language models in addressing building energy modeling challenges and outline potential applications including 1) simulation input generation, 2) simulation output analysis and visualization, 3) conducting error analysis, 4) co-simulation, 5) simulation knowledge extraction and training, and 6) simulation optimization. Three case studies reveal the transformative potential of large language models in automating and optimizing building energy modeling tasks, underscoring the pivotal role of artificial intelligence in advancing sustainable building practices and energy efficiency. The case studies demonstrate that selecting the right large language model techniques is essential to enhance performance and reduce engineering efforts. Besides direct use of large language models, three specific techniques were utilized: 1) prompt engineering, 2) retrieval-augmented generation, and 3) multi-agent large language models. The findings advocate a multidisciplinary approach in future artificial intelligence research, with implications extending beyond building energy modeling to other specialized engineering modeling.
Abstract:Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.




Abstract:Large Language Models (LLMs) generate responses to questions; however, their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions. To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models. The objective is to enhance AI models through continuous feedback loops, utilizing metrics such as cosine similarity, LLM evaluation and Rouge-L scores to evaluate the models. Leveraging LLMs like GPT-3.5, GPT4ALL, and LLaMA2, and Claude, this approach is benchmarked on financial datasets, including the FinanceBench and RAG Instruct Benchmark Tester Dataset, illustrating the necessity of fine-tuning. The results showcase the capability of fine-tuned models to surpass the accuracy of zero-shot LLMs, providing superior question and answering capabilities. Notably, the combination of fine-tuning the LLM with a process known as Retrieval Augmented Generation (RAG) proves to generate responses with improved accuracy.
Abstract:In the realm of artificial intelligence, where a vast majority of data is unstructured, obtaining substantial amounts of labeled data to train supervised machine learning models poses a significant challenge. To address this, we delve into few-shot and active learning, where are goal is to improve AI models with human feedback on a few labeled examples. This paper focuses on understanding how a continuous feedback loop can refine models, thereby enhancing their accuracy, recall, and precision through incremental human input. By employing Large Language Models (LLMs) such as GPT-3.5, BERT, and SetFit, we aim to analyze the efficacy of using a limited number of labeled examples to substantially improve model accuracy. We benchmark this approach on the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to prove that with just a few labeled examples, we are able to surpass the accuracy of zero shot large language models to provide enhanced text classification performance. We demonstrate that rather than needing to manually label millions of rows of data, we just need to label a few and the model can effectively predict the rest.
Abstract:Despite neural networks (NN) have been widely applied in various fields and generally outperforms humans, they still lack interpretability to a certain extent, and humans are unable to intuitively understand the decision logic of NN. This also hinders the knowledge interaction between humans and NN, preventing humans from getting involved to give direct guidance when NN's decisions go wrong. While recent research in explainable AI has achieved interpretability of NN from various perspectives, it has not yet provided effective methods for knowledge exchange between humans and NN. To address this problem, we constructed a two-way interaction interface that uses structured representations of visual concepts and their relationships as the "language" for knowledge exchange between humans and NN. Specifically, NN provide intuitive reasoning explanations to humans based on the class-specific structural concepts graph (C-SCG). On the other hand, humans can modify the biases present in the C-SCG through their prior knowledge and reasoning ability, and thus provide direct knowledge guidance to NN through this interface. Through experimental validation, based on this interaction interface, NN can provide humans with easily understandable explanations of the reasoning process. Furthermore, human involvement and prior knowledge can directly and effectively contribute to enhancing the performance of NN.
Abstract:The current deep neural network algorithm still stays in the end-to-end training supervision method like Image-Label pairs, which makes traditional algorithm is difficult to explain the reason for the results, and the prediction logic is difficult to understand and analyze. The current algorithm does not use the existing human knowledge information, which makes the model not in line with the human cognition model and makes the model not suitable for human use. In order to solve the above problems, the present invention provides a deep neural network training method based on the human knowledge, which uses the human cognition model to construct the deep neural network training model, and uses the existing human knowledge information to construct the deep neural network training model. This paper proposes a multi-level hierarchical deep learning algorithm, which is composed of multi-level hierarchical deep neural network architecture and multi-level hierarchical deep learning framework. The experimental results show that the proposed algorithm can effectively explain the hidden information of the neural network. The goal of our study is to improve the interpretability of deep neural networks (DNNs) by providing an analysis of the impact of knowledge injection on the classification task. We constructed a knowledge injection dataset with matching knowledge data and image classification data. The knowledge injection dataset is the benchmark dataset for the experiments in the paper. Our model expresses the improvement in interpretability and classification task performance of hidden layers at different scales.
Abstract:In recent years, the recognition of free-hand sketches has remained a popular task. However, in some special fields such as the military field, free-hand sketches are difficult to sample on a large scale. Common data augmentation and image generation techniques are difficult to produce images with various free-hand sketching styles. Therefore, the recognition and segmentation tasks in related fields are limited. In this paper, we propose a novel adversarial generative network that can accurately generate realistic free-hand sketches with various styles. We explore the performance of the model, including using styles randomly sampled from a prior normal distribution to generate images with various free-hand sketching styles, disentangling the painters' styles from known free-hand sketches to generate images with specific styles, and generating images of unknown classes that are not in the training set. We further demonstrate with qualitative and quantitative evaluations our advantages in visual quality, content accuracy, and style imitation on SketchIME.
Abstract:Flowcharts and mind maps, collectively known as flowmind, are vital in daily activities, with hand-drawn versions facilitating real-time collaboration. However, there's a growing need to digitize them for efficient processing. Automated conversion methods are essential to overcome manual conversion challenges. Existing sketch recognition methods face limitations in practical situations, being field-specific and lacking digital conversion steps. Our paper introduces the Flowmind2digital method and hdFlowmind dataset to address these challenges. Flowmind2digital, utilizing neural networks and keypoint detection, achieves a record 87.3% accuracy on our dataset, surpassing previous methods by 11.9%. The hdFlowmind dataset, comprising 1,776 annotated flowminds across 22 scenarios, outperforms existing datasets. Additionally, our experiments emphasize the importance of simple graphics, enhancing accuracy by 9.3%.




Abstract:Gesture recognition is an indispensable component of natural and efficient human-computer interaction technology, particularly in desktop-level applications, where it can significantly enhance people's productivity. However, the current gesture recognition community lacks a suitable desktop-level (top-view perspective) dataset for lightweight gesture capture devices. In this study, we have established a dataset named GR4DHCI. What distinguishes this dataset is its inherent naturalness, intuitive characteristics, and diversity. Its primary purpose is to serve as a valuable resource for the development of desktop-level portable applications. GR4DHCI comprises over 7,000 gesture samples and a total of 382,447 frames for both Stereo IR and skeletal modalities. We also address the variances in hand positioning during desktop interactions by incorporating 27 different hand positions into the dataset. Building upon the GR4DHCI dataset, we conducted a series of experimental studies, the results of which demonstrate that the fine-grained classification blocks proposed in this paper can enhance the model's recognition accuracy. Our dataset and experimental findings presented in this paper are anticipated to propel advancements in desktop-level gesture recognition research.