Object detection and localization are crucial tasks for biomedical image analysis, particularly in the field of hematology where the detection and recognition of blood cells are essential for diagnosis and treatment decisions. While attention-based methods have shown significant progress in object detection in various domains, their application in medical object detection has been limited due to the unique challenges posed by medical imaging datasets. To address this issue, we propose ADA-YOLO, a light-weight yet effective method for medical object detection that integrates attention-based mechanisms with the YOLOv8 architecture. Our proposed method leverages the dynamic feature localisation and parallel regression for computer vision tasks through \textit{adaptive head} module. Empirical experiments were conducted on the Blood Cell Count and Detection (BCCD) dataset to evaluate the effectiveness of ADA-YOLO. The results showed that ADA-YOLO outperforms the YOLOv8 model in mAP (mean average precision) on the BCCD dataset by using more than 3 times less space than YOLOv8. This indicates that our proposed method is effective. Moreover, the light-weight nature of our proposed method makes it suitable for deployment in resource-constrained environments such as mobile devices or edge computing systems. which could ultimately lead to improved diagnosis and treatment outcomes in the field of hematology.
This paper introduces an innovative adaptive hybrid model for stock market predictions, leveraging the capabilities of an enhanced Variational Mode Decomposition (VMD), Feature Engineering (FE), and stacked Informer integrated with an adaptive loss function. Through rigorous experimentation, the proposed model, termed Adam+GC+enhanced informer (We name it VMGCformer), demonstrates significant proficiency in addressing the intricate dynamics and volatile nature of stock market data. Experimental results, derived from multiple benchmark datasets, underscore the model's superiority in terms of prediction accuracy, responsiveness, and generalization capabilities over traditional and other hybrid models. The research further highlights potential avenues for optimization and introduces future directions to enhance predictive modeling, especially for small enterprises and feature engineering.
In this paper, the limitations of YOLOv5s model on small target detection task are deeply studied and improved. The performance of the model is successfully enhanced by introducing GhostNet-based convolutional module, RepGFPN-based Neck module optimization, CA and Transformer's attention mechanism, and loss function improvement using NWD. The experimental results validate the positive impact of these improvement strategies on model precision, recall and mAP. In particular, the improved model shows significant superiority in dealing with complex backgrounds and tiny targets in real-world application tests. This study provides an effective optimization strategy for the YOLOv5s model on small target detection, and lays a solid foundation for future related research and applications.
As global attention on renewable and clean energy grows, the research and implementation of microgrids become paramount. This paper delves into the methodology of exploring the relationship between the operational and environmental costs of microgrids through multi-objective optimization models. By integrating various optimization algorithms like Genetic Algorithm, Simulated Annealing, Ant Colony Optimization, and Particle Swarm Optimization, we propose an integrated approach for microgrid optimization. Simulation results depict that these algorithms provide different dispatch results under economic and environmental dispatch, revealing distinct roles of diesel generators and micro gas turbines in microgrids. Overall, this study offers in-depth insights and practical guidance for microgrid design and operation.
In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrading the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency sensors being the primary participants, thus lacking a comprehensive environment feature characterization and facing a severe performance bottleneck in dynamic environments. In light of this, we generalize the concept of ISAC by mimicking human synesthesia to support intelligent multi-modal sensing-communication integration. The so-termed Synesthesia of Machines (SoM) is not only oriented to generic scenarios, but also particularly suitable for solving challenges arising from dynamic scenarios. We commence by justifying the necessity and potentials of SoM. Subsequently, we offer the definition of SoM and zoom into the specific operating modes, followed by discussions of the state-of-the-art, corresponding objectives, and challenges. To facilitate SoM research, we overview the prerequisite of SoM research, that is, mixed multi-modal (MMM) datasets, and introduce our work. Built upon the MMM datasets, we introduce the mapping relationships between multi-modal sensing and communications, and discuss how channel modeling can be customized to support the exploration of such relationships. Afterwards, we delve into the current research state and implementing challenges of SoM-enhance-based and SoM-concert-based applications. We first overview the SoM-enhance-based communication system designs and present simulation results related to dual-function waveform and predictive beamforming design. Afterwards, we review the recent advances of SoM-concert for single-agent and multi-agent environment sensing. Finally, we propose some open issues and potential directions.
The sixth generation (6G) of mobile communication system is witnessing a new paradigm shift, i.e., integrated sensing-communication system. A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research. This paper develops a novel simulation dataset, named M3SC, for mixed multi-modal (MMM) sensing-communication integration, and the generation framework of the M3SC dataset is further given. To obtain multi-modal sensory data in physical space and communication data in electromagnetic space, we utilize AirSim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data. Furthermore, the in-depth integration and precise alignment of AirSim, WaveFarer, and Wireless InSite are achieved. The M3SC dataset covers various weather conditions, various frequency bands, and different times of the day. Currently, the M3SC dataset contains 1500 snapshots, including 80 RGB images, 160 depth maps, 80 LiDAR point clouds, 256 sets of mmWave waveforms with 8 radar point clouds, and 72 channel impulse response (CIR) matrices per snapshot, thus totaling 120,000 RGB images, 240,000 depth maps, 120,000 LiDAR point clouds, 384,000 sets of mmWave waveforms with 12,000 radar point clouds, and 108,000 CIR matrices. The data processing result presents the multi-modal sensory information and communication channel statistical properties. Finally, the MMM sensing-communication application, which can be supported by the M3SC dataset, is discussed.
Singing voice transcription converts recorded singing audio to musical notation. Sound contamination (such as accompaniment) and lack of annotated data make singing voice transcription an extremely difficult task. We take two approaches to tackle the above challenges: 1) introducing multimodal learning for singing voice transcription together with a new multimodal singing dataset, N20EMv2, enhancing noise robustness by utilizing video information (lip movements to predict the onset/offset of notes), and 2) adapting self-supervised learning models from the speech domain to the singing voice transcription task, significantly reducing annotated data requirements while preserving pretrained features. We build a self-supervised learning based audio-only singing voice transcription system, which not only outperforms current state-of-the-art technologies as a strong baseline, but also generalizes well to out-of-domain singing data. We then develop a self-supervised learning based video-only singing voice transcription system that detects note onsets and offsets with an accuracy of about 80\%. Finally, based on the powerful acoustic and visual representations extracted by the above two systems as well as the feature fusion design, we create an audio-visual singing voice transcription system that improves the noise robustness significantly under different acoustic environments compared to the audio-only systems.
In this paper, a novel tactile sensing mechanism for soft robotic fingers is proposed. Inspired by the proprioception mechanism found in mammals, the proposed approach infers tactile information from a strain sensor attached on the finger's tendon. We perform experiments to test the tactile sensing capabilities of the proposed structures, and our results indicate this method is capable of palpating texture and stiffness in both abduction and flexion contact. Under systematic cross validation, the proposed system achieved 100% and 99.7% accuracy in texture and stiffness discrimination respectively, which validate the viability of this approach. Furthermore, we use statistics tools to determine the significance of various features extracted for classification.
After a grasp has been planned, if the object orientation changes, the initial grasp may but not always have to be modified to accommodate the orientation change. For example, rotation of a cylinder by any amount around its centerline does not change its geometric shape relative to the grasper. Objects that can be approximated to solids of revolution or contain other geometric symmetries are prevalent in everyday life, and this information can be employed to improve the efficiency of existing grasp planning models. This paper experimentally investigates change in human-planned grasps under varied object orientations. With 13,440 recorded human grasps, our results indicate that during pick-and-place task of ordinary objects, stable grasps can be achieved with a small subset of grasp types, and the wrist-related parameters follow normal distribution. Furthermore, we show this knowledge can allow faster convergence of grasp planning algorithm.
Identifying the location of a disturbance and its magnitude is an important component for stable operation of power systems. We study the problem of localizing and estimating a disturbance in the interconnected power system. We take a model-free approach to this problem by using frequency data from generators. Specifically, we develop a logistic regression based method for localization and a linear regression based method for estimation of the magnitude of disturbance. Our model-free approach does not require the knowledge of system parameters such as inertia constants and topology, and is shown to achieve highly accurate localization and estimation performance even in the presence of measurement noise and missing data.