This paper presents InfiniteEn, a multi-source energy harvesting platform designed for the Internet of Batteryless Things (IoBT). InfiniteEn incorporates an efficient energy combiner to combine energy from different harvesting sources. The energy combiner uses capacitor-to-capacitor energy transfer to combine energy from multiple sources and achieves a nominal efficiency of 88\%. In addition to multiplexing different sources, the energy combiner facilitates the estimation of the harvesting rate and the calibration of the capacity of the energy buffer. The energy storage architecture of InfiniteEn employs an array of storage buffers that can be configured on demand to cope with varying energy harvesting rates and load's energy requirements. To address the challenge of tracking the energy state of batteryless devices with minimum energy overhead, this work introduces the concept of a Load Monitoring Module (LMM). InfiniteEn is a load-agnostic platform, meaning that it does not require any prior knowledge of the energy profile of the load to track its energy states. The LMM assists InfiniteEn in tracking the energy state of the load and dynamically modifying the storage buffers to meet the load's energy requirements. Furthermore, the module can detect and signal any abnormalities in the energy consumption pattern of the load caused by a hardware or software defect. Experiments demonstrate that LMM has a response time of less than 11 ms to energy state changes.
Recent text-to-image generation models have demonstrated incredible success in generating images that faithfully follow input prompts. However, the requirement of using words to describe a desired concept provides limited control over the appearance of the generated concepts. In this work, we address this shortcoming by proposing an approach to enable personalization capabilities in existing text-to-image diffusion models. We propose a novel architecture (BootPIG) that allows a user to provide reference images of an object in order to guide the appearance of a concept in the generated images. The proposed BootPIG architecture makes minimal modifications to a pretrained text-to-image diffusion model and utilizes a separate UNet model to steer the generations toward the desired appearance. We introduce a training procedure that allows us to bootstrap personalization capabilities in the BootPIG architecture using data generated from pretrained text-to-image models, LLM chat agents, and image segmentation models. In contrast to existing methods that require several days of pretraining, the BootPIG architecture can be trained in approximately 1 hour. Experiments on the DreamBooth dataset demonstrate that BootPIG outperforms existing zero-shot methods while being comparable with test-time finetuning approaches. Through a user study, we validate the preference for BootPIG generations over existing methods both in maintaining fidelity to the reference object's appearance and aligning with textual prompts.
A near-field holographic multiple-input multiple-output (MIMO) based integrated sensing and communications (ISAC) framework is proposed for both downlink and uplink scenarios, where spherical wave-based model is considered to capture the characteristics of the near field. The coupling effect introduced by the densely spaced antennas of the holographic MIMO are characterized by spatially correlated Rayleigh fading. Based on the proposed framework, by considering both instantaneous channel state information (CSI) and statistical CSI, closed-form expressions are derived for sensing rates (SRs), communication rates (CRs), and outage probabilities under different ISAC designs. Further insights are gained by examining high signal-to-noise ratio slopes and diversity orders. Specifically, 1) for the downlink case, a sensing-centric (S-C) design and a communications-centric (C-C) design are investigated based on different beamforming strategies, and a Pareto optimal design is proposed to characterize the attainable SR-CR region; and 2) for the uplink case, the S-C design and the C-C design are distinguished by the interference cancellation order of the communication signal and the sensing signal, and the rate region is obtained through a time-sharing strategy. Numerical results reveal that the proposed ISAC system achieves more extensive rate regions than the conventional frequency-division sensing and communications system, highlighting its superior performance.
Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation requires the activation functions to be slope-restricted on $[0,1]$, preventing its further use for more general activation functions such as GroupSort, MaxMin, and Householder. One can rewrite MaxMin activations for example as residual ReLU networks. However, a direct application of LipSDP to the resultant residual ReLU networks is conservative and even fails in recovering the well-known fact that the MaxMin activation is 1-Lipschitz. Our paper bridges this gap and extends LipSDP beyond slope-restricted activation functions. To this end, we provide novel quadratic constraints for GroupSort, MaxMin, and Householder activations via leveraging their underlying properties such as sum preservation. Our proposed analysis is general and provides a unified approach for estimating $\ell_2$ and $\ell_\infty$ Lipschitz bounds for a rich class of neural network architectures, including non-residual and residual neural networks and implicit models, with GroupSort, MaxMin, and Householder activations. Finally, we illustrate the utility of our approach with a variety of experiments and show that our proposed SDPs generate less conservative Lipschitz bounds in comparison to existing approaches.
Soft electrohydraulic actuators known as HASEL actuators have attracted widespread research interest due to their outstanding dynamic performance and high output power. However, the displacement of electrohydraulic actuators usually declines with time under constant DC voltage, which hampers its prospective application. A mathematical model is firstly established to not only explain the decrease in displacement under DC voltage but also predict the relatively stable displacement with oscillation under AC square wave voltage. The mathematical model is validated since the actual displacement confirms the trend observed by our model. To smooth the displacement oscillation introduced by AC voltage, a serial elastic component is incorporated to form a SE-HASEL actuator. A feedback control with a proportion-integration algorithm enables the SE-HASEL actuator to eliminate the obstinate displacement hysteresis. Our results revealed that, through our methodology, the SE-HASEL actuator can give stable and smooth displacement and is capable of absorbing external impact disturbance simultaneously. A rotary joint based on the SE-HASEL actuator is developed to reflect its possibility to generate a common rotary motion for wide robotic applications. More importantly, this paper also proposes a highly accurate needle biopsy robot that can be utilized in MRI-guide surgical procedures. Overall, we have achieved AC-driven series elastic electrohydraulic actuators that can exhibit stable and smooth displacement output.
Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes -- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes. Furthermore, due to various time factors and the annotation preferences of human experts, the heterophily and homophily can change across training and testing data, which is called structural distribution shift (SDS) in this paper. The mainstream methods are built on graph neural networks (GNNs), benefiting the classification of normals from aggregating homophilous neighbors, yet ignoring the SDS issue for anomalies and suffering from poor generalization. This work solves the problem from a feature view. We observe that the degree of SDS varies between anomalies and normal nodes. Hence to address the issue, the key lies in resisting high heterophily for anomalies meanwhile benefiting the learning of normals from homophily. We tease out the anomaly features on which we constrain to mitigate the effect of heterophilous neighbors and make them invariant. We term our proposed framework as Graph Decomposition Network (GDN). Extensive experiments are conducted on two benchmark datasets, and the proposed framework achieves a remarkable performance boost in GAD, especially in an SDS environment where anomalies have largely different structural distribution across training and testing environments. Codes are open-sourced in https://github.com/blacksingular/wsdm_GDN.
Datasets encountered in scientific and engineering applications appear in complex formats (e.g., images, multivariate time series, molecules, video, text strings, networks). Graph theory provides a unifying framework to model such datasets and enables the use of powerful tools that can help analyze, visualize, and extract value from data. In this work, we present PlasmoData.jl, an open-source, Julia framework that uses concepts of graph theory to facilitate the modeling and analysis of complex datasets. The core of our framework is a general data modeling abstraction, which we call a DataGraph. We show how the abstraction and software implementation can be used to represent diverse data objects as graphs and to enable the use of tools from topology, graph theory, and machine learning (e.g., graph neural networks) to conduct a variety of tasks. We illustrate the versatility of the framework by using real datasets: i) an image classification problem using topological data analysis to extract features from the graph model to train machine learning models; ii) a disease outbreak problem where we model multivariate time series as graphs to detect abnormal events; and iii) a technology pathway analysis problem where we highlight how we can use graphs to navigate connectivity. Our discussion also highlights how PlasmoData.jl leverages native Julia capabilities to enable compact syntax, scalable computations, and interfaces with diverse packages.
Ambiguity performances of reference signal patterns for integrated communication and sensing are studied via time delay and Doppler shift detection. A reference signal pattern with a staggering offset of a linear slope relatively prime to the transmission comb is suggested for low-complexity, standard-resolution sensing algorithms. We also propose an extended guard interval design to extend the maximum time delay for post-FFT sensing algorithms.
Clinical patient notes are critical for documenting patient interactions, diagnoses, and treatment plans in medical practice. Ensuring accurate evaluation of these notes is essential for medical education and certification. However, manual evaluation is complex and time-consuming, often resulting in variability and resource-intensive assessments. To tackle these challenges, this research introduces an approach leveraging state-of-the-art Natural Language Processing (NLP) techniques, specifically Masked Language Modeling (MLM) pretraining, and pseudo labeling. Our methodology enhances efficiency and effectiveness, significantly reducing training time without compromising performance. Experimental results showcase improved model performance, indicating a potential transformation in clinical note assessment.
In contemporary design practices, the integration of computer vision and generative artificial intelligence (genAI) represents a transformative shift towards more interactive and inclusive processes. These technologies offer new dimensions of image analysis and generation, which are particularly relevant in the context of urban landscape reconstruction. This paper presents a novel workflow encapsulated within a prototype application, designed to leverage the synergies between advanced image segmentation and diffusion models for a comprehensive approach to urban design. Our methodology encompasses the OneFormer model for detailed image segmentation and the Stable Diffusion XL (SDXL) diffusion model, implemented through ControlNet, for generating images from textual descriptions. Validation results indicated a high degree of performance by the prototype application, showcasing significant accuracy in both object detection and text-to-image generation. This was evidenced by superior Intersection over Union (IoU) and CLIP scores across iterative evaluations for various categories of urban landscape features. Preliminary testing included utilising UrbanGenAI as an educational tool enhancing the learning experience in design pedagogy, and as a participatory instrument facilitating community-driven urban planning. Early results suggested that UrbanGenAI not only advances the technical frontiers of urban landscape reconstruction but also provides significant pedagogical and participatory planning benefits. The ongoing development of UrbanGenAI aims to further validate its effectiveness across broader contexts and integrate additional features such as real-time feedback mechanisms and 3D modelling capabilities. Keywords: generative AI; panoptic image segmentation; diffusion models; urban landscape design; design pedagogy; co-design