Uncrewed Aerial Vehicles (UAVs) are increasingly used in civilian and industrial applications, making secure low-altitude operations crucial. In dense mmWave environments, accurately classifying low-altitude UAVs as either inside authorized or restricted airspaces remains challenging, requiring models that handle complex propagation and signal variability. This paper proposes a deep learning model, referred to as CoBA, which stands for integrated Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and Attention which leverages Fifth Generation (5G) millimeter-wave (mmWave) radio measurements to classify UAV operations in authorized and restricted airspaces at low altitude. The proposed CoBA model integrates convolutional, bidirectional recurrent, and attention layers to capture both spatial and temporal patterns in UAV radio measurements. To validate the model, a dedicated dataset is collected using the 5G mmWave network at TalTech, with controlled low altitude UAV flights in authorized and restricted scenarios. The model is evaluated against conventional ML models and a fingerprinting-based benchmark. Experimental results show that CoBA achieves superior accuracy, significantly outperforming all baseline models and demonstrating its potential for reliable and regulated UAV airspace monitoring.
Fine-grained emotion recognition is a challenging multi-label NLP task due to label overlap and class imbalance. In this work, we benchmark three modeling families on the GoEmotions dataset: a TF-IDF-based logistic regression system trained with binary relevance, a BiLSTM with attention, and a BERT model fine-tuned for multi-label classification. Experiments follow the official train/validation/test split, and imbalance is mitigated using inverse-frequency class weights. Across several metrics, namely Micro-F1, Macro-F1, Hamming Loss, and Subset Accuracy, we observe that logistic regression attains the highest Micro-F1 of 0.51, while BERT achieves the best overall balance surpassing the official paper's reported results, reaching Macro-F1 0.49, Hamming Loss 0.036, and Subset Accuracy 0.36. This suggests that frequent emotions often rely on surface lexical cues, whereas contextual representations improve performance on rarer emotions and more ambiguous examples.
Accurate multi-label classification of electrocardiogram (ECG) signals remains challenging due to the coexistence of multiple cardiac conditions, pronounced class imbalance, and long-range temporal dependencies in multi-lead recordings. Although recent studies increasingly rely on deep and stacked recurrent architectures, the necessity and clinical justification of such architectural complexity have not been rigorously examined. In this work, we perform a systematic comparative evaluation of convolutional neural networks (CNNs) combined with multiple recurrent configurations, including LSTM, GRU, Bidirectional LSTM (BiLSTM), and their stacked variants, for multi-label ECG classification on the PTB-XL dataset comprising 23 diagnostic categories. The CNN component serves as a morphology-driven baseline, while recurrent layers are progressively integrated to assess their contribution to temporal modeling and generalization performance. Experimental results indicate that a CNN integrated with a single BiLSTM layer achieves the most favorable trade-off between predictive performance and model complexity. This configuration attains superior Hamming loss (0.0338), macro-AUPRC (0.4715), micro-F1 score (0.6979), and subset accuracy (0.5723) compared with deeper recurrent combinations. Although stacked recurrent models occasionally improve recall for specific rare classes, our results provide empirical evidence that increasing recurrent depth yields diminishing returns and may degrade generalization due to reduced precision and overfitting. These findings suggest that architectural alignment with the intrinsic temporal structure of ECG signals, rather than increased recurrent depth, is a key determinant of robust performance and clinically relevant deployment.
Robust machine learning for regulatory genomics is studied under biologically and technically induced distribution shifts. Deep convolutional and attention based models achieve strong in distribution performance on DNA regulatory sequence prediction tasks but are usually evaluated under i.i.d. assumptions, even though real applications involve cell type specific programs, evolutionary turnover, assay protocol changes, and sequencing artifacts. We introduce a robustness framework that combines a mechanistic simulation benchmark with real data analysis on a massively parallel reporter assay (MPRA) dataset to quantify performance degradation, calibration failures, and uncertainty based reliability. In simulation, motif driven regulatory outputs are generated with cell type specific programs, PWM perturbations, GC bias, depth variation, batch effects, and heteroscedastic noise, and CNN, BiLSTM, and transformer models are evaluated. Models remain accurate and reasonably calibrated under mild GC content shifts but show higher error, severe variance miscalibration, and coverage collapse under motif effect rewiring and noise dominated regimes, revealing robustness gaps invisible to standard i.i.d. evaluation. Adding simple biological structural priors motif derived features in simulation and global GC content in MPRA improves in distribution error and yields consistent robustness gains under biologically meaningful genomic shifts, while providing only limited protection against strong assay noise. Uncertainty-aware selective prediction offers an additional safety layer that risk coverage analyses on simulated and MPRA data show that filtering low confidence inputs recovers low risk subsets, including under GC-based out-of-distribution conditions, although reliability gains diminish when noise dominates.
Weakly Supervised Semantic Segmentation (WSSS), which relies only on image-level labels, has attracted significant attention for its cost-effectiveness and scalability. Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations. However, they often neglect the complex contextual dependencies among image patches, resulting in incomplete local representations and limited segmentation accuracy. To address these issues, we propose the Context Patch Fusion with Class Token Enhancement (CPF-CTE) framework, which exploits contextual relations among patches to enrich feature representations and improve segmentation. At its core, the Contextual-Fusion Bidirectional Long Short-Term Memory (CF-BiLSTM) module captures spatial dependencies between patches and enables bidirectional information flow, yielding a more comprehensive understanding of spatial correlations. This strengthens feature learning and segmentation robustness. Moreover, we introduce learnable class tokens that dynamically encode and refine class-specific semantics, enhancing discriminative capability. By effectively integrating spatial and semantic cues, CPF-CTE produces richer and more accurate representations of image content. Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.
The accurate identification of antiviral peptides (AVPs) is crucial for novel drug development. However, existing methods still have limitations in capturing complex sequence dependencies and distinguishing confusing samples with high similarity. To address these challenges, we propose AVP-Pro, a novel two-stage predictive framework that integrates adaptive feature fusion and contrastive learning. To comprehensively capture the physicochemical properties and deep-seated patterns of peptide sequences, we constructed a panoramic feature space encompassing 10 distinct descriptors and designed a hierarchical fusion architecture. This architecture integrates self-attention and adaptive gating mechanisms to dynamically modulate the weights of local motifs extracted by CNNs and global dependencies captured by BiLSTMs based on sequence context. Targeting the blurred decision boundary caused by the high similarity between positive and negative sample sequences, we adopted an Online Hard Example Mining (OHEM)-driven contrastive learning strategy enhanced by BLOSUM62. This approach significantly sharpened the model's discriminative power. Model evaluation results show that in the first stage of general AVP identification, the model achieved an accuracy of 0.9531 and an MCC of 0.9064, outperforming existing state-of-the-art (SOTA) methods. In the second stage of functional subtype prediction, combined with a transfer learning strategy, the model realized accurate classification of 6 viral families and 8 specific viruses under small-sample conditions. AVP-Pro provides a powerful and interpretable new tool for the high-throughput screening of antiviral drugs. To further enhance accessibility for users, we have developed a user-friendly web interface, which is available at https://wwwy1031-avp-pro.hf.space.
Fine-grained labor market analysis increasingly relies on mapping unstructured job advertisements to standardized skill taxonomies such as ESCO. This mapping is naturally formulated as an Extreme Multi-Label Classification (XMLC) problem, but supervised solutions are constrained by the scarcity and cost of large-scale, taxonomy-aligned annotations--especially in non-English settings where job-ad language diverges substantially from formal skill definitions. We propose a zero-shot skill extraction framework that eliminates the need for manually labeled job-ad training data. The framework uses a Large Language Model (LLM) to synthesize training instances from ESCO definitions, and introduces hierarchically constrained multi-skill generation based on ESCO Level-2 categories to improve semantic coherence in multi-label contexts. On top of the synthetic corpus, we train a contrastive bi-encoder that aligns job-ad sentences with ESCO skill descriptions in a shared embedding space; the encoder augments a BERT backbone with BiLSTM and attention pooling to better model long, information-dense requirement statements. An upstream RoBERTa-based binary filter removes non-skill sentences to improve end-to-end precision. Experiments show that (i) hierarchy-conditioned generation improves both fluency and discriminability relative to unconstrained pairing, and (ii) the resulting multi-label model transfers effectively to real-world Chinese job advertisements, achieving strong zero-shot retrieval performance (F1@5 = 0.72) and outperforming TF--IDF and standard BERT baselines. Overall, the proposed pipeline provides a scalable, data-efficient pathway for automated skill coding in labor economics and workforce analytics.
The rapid deployment of Internet of Things (IoT) devices has led to large-scale sensor networks that monitor environmental and urban phenomena in real time. Communities of Interest (CoIs) provide a promising paradigm for organising heterogeneous IoT sensor networks by grouping devices with similar operational and environmental characteristics. This work presents an anomaly detection framework based on the CoI paradigm by grouping sensors into communities using a fused similarity matrix that incorporates temporal correlations via Spearman coefficients, spatial proximity using Gaussian distance decay, and elevation similarities. For each community, representative stations based on the best silhouette are selected and three autoencoder architectures (BiLSTM, LSTM, and MLP) are trained using Bayesian hyperparameter optimization with expanding window cross-validation and tested on stations from the same cluster and the best representative stations of other clusters. The models are trained on normal temperature patterns of the data and anomalies are detected through reconstruction error analysis. Experimental results show a robust within-community performance across the evaluated configurations, while variations across communities are observed. Overall, the results support the applicability of community-based model sharing in reducing computational overhead and to analyse model generalisability across IoT sensor networks.
The rapid development of large language models has led to an increase in AI-generated text, with students increasingly using LLM-generated content as their own work, which violates academic integrity. This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures. We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage. This approach ensures robust generalization across unseen domains. Our experiments show that TF-IDF logistic regression achieves a reasonable baseline accuracy of 82.87%. However, deep learning models outperform it. The BiLSTM classifier achieves an accuracy of 88.86%, while DistilBERT achieves a similar accuracy of 88.11% with the highest ROC-AUC score of 0.96, demonstrating the strongest overall performance. The results indicate that contextual semantic modeling is significantly superior to lexical features and highlight the importance of mitigating topic memorization through appropriate evaluation protocols. The limitations of this work are primarily related to dataset diversity and computational constraints. In future work, we plan to expand dataset diversity and utilize parameter-efficient fine-tuning methods such as LoRA. We also plan to explore smaller or distilled models and employ more efficient batching strategies and hardware-aware optimization.
Surface electromyography (sEMG) provides a direct neural interface for decoding muscle activity and offers a promising foundation for keyboard-free text input in wearable and mixed-reality systems. Previous sEMG-to-text studies mainly focused on recognizing letters directly from sEMG signals, forming an important first step toward translating muscle activity into text. Building on this foundation, we present MyoText, a hierarchical framework that decodes sEMG signals to text through physiologically grounded intermediate stages. MyoText first classifies finger activations from multichannel sEMG using a CNN-BiLSTM-Attention model, applies ergonomic typing priors to infer letters, and reconstructs full sentences with a fine-tuned T5 transformer. This modular design mirrors the natural hierarchy of typing, linking muscle intent to language output and reducing the search space for decoding. Evaluated on 30 users from the emg2qwerty dataset, MyoText outperforms baselines by achieving 85.4% finger-classification accuracy, 5.4% character error rate (CER), and 6.5% word error rate (WER). Beyond accuracy gains, this methodology establishes a principled pathway from neuromuscular signals to text, providing a blueprint for virtual and augmented-reality typing interfaces that operate entirely without physical keyboards. By integrating ergonomic structure with transformer-based linguistic reasoning, MyoText advances the feasibility of seamless, wearable neural input for future ubiquitous computing environments.