Writer identification due to its widespread application in various fields has gained popularity over the years. In scenarios where optimum handwriting samples are available, whether they be in the form of a single line, a sentence, or an entire page, writer identification algorithms have demonstrated noteworthy levels of accuracy. However, in scenarios where only a limited number of handwritten samples are available, particularly in the form of word images, there is a significant scope for improvement. In this paper, we propose a writer identification system based on an attention-driven Convolutional Neural Network (CNN). The system is trained utilizing image segments, known as fragments, extracted from word images, employing a pyramid-based strategy. This methodology enables the system to capture a comprehensive representation of the data, encompassing both fine-grained details and coarse features across various levels of abstraction. These extracted fragments serve as the training data for the convolutional network, enabling it to learn a more robust representation compared to traditional convolution-based networks trained on word images. Additionally, the paper explores the integration of an attention mechanism to enhance the representational power of the learned features. The efficacy of the proposed algorithm is evaluated on three benchmark databases, demonstrating its proficiency in writer identification tasks, particularly in scenarios with limited access to handwriting data.
Algorithms designed for addressing typical supervised classification problems can only learn from a fixed set of samples and labels, making them unsuitable for the real world, where data arrives as a stream of samples often associated with multiple labels over time. This motivates the study of task-agnostic continual multi-label learning problems. While algorithms using deep learning approaches for continual multi-label learning have been proposed in the recent literature, they tend to be computationally heavy. Although spiking neural networks (SNNs) offer a computationally efficient alternative to artificial neural networks, existing literature has not used SNNs for continual multi-label learning. Also, accurately determining multiple labels with SNNs is still an open research problem. This work proposes a dual output spiking architecture (DOSA) to bridge these research gaps. A novel imbalance-aware loss function is also proposed, improving the multi-label classification performance of the model by making it more robust to data imbalance. A modified F1 score is presented to evaluate the effectiveness of the proposed loss function in handling imbalance. Experiments on several benchmark multi-label datasets show that DOSA trained with the proposed loss function shows improved robustness to data imbalance and obtains better continual multi-label learning performance than CIFDM, a previous state-of-the-art algorithm.
Recent advancements in motion planning for Autonomous Vehicles (AVs) show great promise in using expert driver behaviors in non-stationary driving environments. However, learning only through expert drivers needs more generalizability to recover from domain shifts and near-failure scenarios due to the dynamic behavior of traffic participants and weather conditions. A deep Graph-based Prediction and Planning Policy Network (GP3Net) framework is proposed for non-stationary environments that encodes the interactions between traffic participants with contextual information and provides a decision for safe maneuver for AV. A spatio-temporal graph models the interactions between traffic participants for predicting the future trajectories of those participants. The predicted trajectories are utilized to generate a future occupancy map around the AV with uncertainties embedded to anticipate the evolving non-stationary driving environments. Then the contextual information and future occupancy maps are input to the policy network of the GP3Net framework and trained using Proximal Policy Optimization (PPO) algorithm. The proposed GP3Net performance is evaluated on standard CARLA benchmarking scenarios with domain shifts of traffic patterns (urban, highway, and mixed). The results show that the GP3Net outperforms previous state-of-the-art imitation learning-based planning models for different towns. Further, in unseen new weather conditions, GP3Net completes the desired route with fewer traffic infractions. Finally, the results emphasize the advantage of including the prediction module to enhance safety measures in non-stationary environments.
Improving the performance of semantic segmentation models using multispectral information is crucial, especially for environments with low-light and adverse conditions. Multi-modal fusion techniques pursue either the learning of cross-modality features to generate a fused image or engage in knowledge distillation but address multimodal and missing modality scenarios as distinct issues, which is not an optimal approach for multi-sensor models. To address this, a novel multi-modal fusion approach called CSK-Net is proposed, which uses a contrastive learning-based spectral knowledge distillation technique along with an automatic mixed feature exchange mechanism for semantic segmentation in optical (EO) and infrared (IR) images. The distillation scheme extracts detailed textures from the optical images and distills them into the optical branch of CSK-Net. The model encoder consists of shared convolution weights with separate batch norm (BN) layers for both modalities, to capture the multi-spectral information from different modalities of the same objects. A Novel Gated Spectral Unit (GSU) and mixed feature exchange strategy are proposed to increase the correlation of modality-shared information and decrease the modality-specific information during the distillation process. Comprehensive experiments show that CSK-Net surpasses state-of-the-art models in multi-modal tasks and for missing modalities when exclusively utilizing IR data for inference across three public benchmarking datasets. For missing modality scenarios, the performance increase is achieved without additional computational costs compared to the baseline segmentation models.
Deep neural networks have shown exemplary performance on semantic scene understanding tasks on source domains, but due to the absence of style diversity during training, enhancing performance on unseen target domains using only single source domain data remains a challenging task. Generation of simulated data is a feasible alternative to retrieving large style-diverse real-world datasets as it is a cumbersome and budget-intensive process. However, the large domain-specific inconsistencies between simulated and real-world data pose a significant generalization challenge in semantic segmentation. In this work, to alleviate this problem, we propose a novel MultiResolution Feature Perturbation (MRFP) technique to randomize domain-specific fine-grained features and perturb style of coarse features. Our experimental results on various urban-scene segmentation datasets clearly indicate that, along with the perturbation of style-information, perturbation of fine-feature components is paramount to learn domain invariant robust feature maps for semantic segmentation models. MRFP is a simple and computationally efficient, transferable module with no additional learnable parameters or objective functions, that helps state-of-the-art deep neural networks to learn robust domain invariant features for simulation-to-real semantic segmentation.
The usage of drones and rovers helps to overcome the limitations of traditional agriculture which has been predominantly human-intensive, for carrying out tasks such as removal of weeds and spraying of fertilizers and pesticides. Drones and rovers are helping to realize precision agriculture and farmers with improved monitoring and surveying at affordable costs. Major benefits have come for vertical farming and fields with irrigation canals. However, drones have a limitation of flight time due to payload constraints. Rovers have limitations in vertical farming and obstacles like canals in agricultural fields. To meet the different requirements of multiple terrains and vertical farming in agriculture, we propose an autonomous hybrid drone-rover vehicle that combines the advantages of both rovers and drones. The prototype is described along with experimental results regarding its ability to avoid obstacles, pluck weeds and spray pesticides.
In this paper, a Priority-based Dynamic REsource Allocation with decentralized Multi-task assignment (P-DREAM) approach is presented to protect a territory from highly manoeuvring intruders. In the first part, static optimization problems are formulated to compute the following parameters of the perimeter defense problem; the number of reserve stations, their locations, the priority region, the monitoring region, and the minimum number of defenders required for the monitoring purpose. The concept of a prioritized intruder is proposed here to identify and handle those critical intruders (computed based on the velocity ratio and location) to be tackled on a priority basis. The computed priority region helps to assign reserve defenders sufficiently earlier such that they can neutralize the prioritized intruders. The monitoring region defines the minimum region to be monitored and is sufficient enough to handle the intruders. In the second part, the earlier developed DREAM approach is modified to incorporate the priority of an intruder. The proposed P-DREAM approach assigns the defenders to the prioritized intruders as the first task. A convex territory protection problem is simulated to illustrate the P-DREAM approach. It involves the computation of static parameters and solving the prioritized task assignments with dynamic resource allocation. Monte-Carlo results were conducted to verify the performance of P-DREAM, and the results clearly show that the P-DREAM approach can protect the territory with consistent performance against highly manoeuvring intruders.
This paper presents a Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) model for maneuver planning. Traditional rule-based maneuver planning approaches often have to improve their abilities to handle the variabilities of real-world driving scenarios. By learning from its experience, a Reinforcement Learning (RL)-based driving agent can adapt to changing driving conditions and improve its performance over time. Our proposed approach combines a predictive model and an RL agent to plan for comfortable and safe maneuvers. The predictive model is trained using historical driving data to predict the future positions of other surrounding vehicles. The surrounding vehicles' past and predicted future positions are embedded in context-aware grid maps. At the same time, the RL agent learns to make maneuvers based on this spatio-temporal context information. Performance evaluation of PMP-DRL has been carried out using simulated environments generated from publicly available NGSIM US101 and I80 datasets. The training sequence shows the continuous improvement in the driving experiences. It shows that proposed PMP-DRL can learn the trade-off between safety and comfortability. The decisions generated by the recent imitation learning-based model are compared with the proposed PMP-DRL for unseen scenarios. The results clearly show that PMP-DRL can handle complex real-world scenarios and make better comfortable and safe maneuver decisions than rule-based and imitative models.
This paper proposes a novel Decentralized Spike-based Learning (DSL) framework for the discrete Perimeter Defense Problem (d-PDP). A team of defenders is operating on the perimeter to protect the circular territory from radially incoming intruders. At first, the d-PDP is formulated as a spatio-temporal multi-task assignment problem (STMTA). The problem of STMTA is then converted into a multi-label learning problem to obtain labels of segments that defenders have to visit in order to protect the perimeter. The DSL framework uses a Multi-Label Classifier using Synaptic Efficacy Function spiking neuRON (MLC-SEFRON) network for deterministic multi-label learning. Each defender contains a single MLC-SEFRON network. Each MLC-SEFRON network is trained independently using input from its own perspective for decentralized operations. The input spikes to the MLC-SEFRON network can be directly obtained from the spatio-temporal information of defenders and intruders without any extra pre-processing step. The output of MLC-SEFRON contains the labels of segments that a defender has to visit in order to protect the perimeter. Based on the multi-label output from the MLC-SEFRON a trajectory is generated for a defender using a Consensus-Based Bundle Algorithm (CBBA) in order to capture the intruders. The target multi-label output for training MLC-SEFRON is obtained from an expert policy. Also, the MLC-SEFRON trained for a defender can be directly used for obtaining labels of segments assigned to another defender without any retraining. The performance of MLC-SEFRON has been evaluated for full observation and partial observation scenarios of the defender. The overall performance of the DSL framework is then compared with expert policy along with other existing learning algorithms. The scalability of the DSL has been evaluated using an increasing number of defenders.
This paper presents a non-iterative approach for finding the assignment of heterogeneous robots to efficiently execute online Pickup and Just-In-Time Delivery (PJITD) tasks with optimal resource utilization. The PJITD assignments problem is formulated as a spatio-temporal multi-task assignment (STMTA) problem. The physical constraints on the map and vehicle dynamics are incorporated in the cost formulation. The linear sum assignment problem is formulated for the heterogeneous STMTA problem. The recently proposed Dynamic Resource Allocation with Multi-task assignments (DREAM) approach has been modified to solve the heterogeneous PJITD problem. At the start, it computes the minimum number of robots required (with their types) to execute given heterogeneous PJITD tasks. These required robots are added to the team to guarantee the feasibility of all PJITD tasks. Then robots in an updated team are assigned to execute the PJITD tasks while minimizing the total cost for the team to execute all PJITD tasks. The performance of the proposed non-iterative approach has been validated using high-fidelity software-in-loop simulations and hardware experiments. The simulations and experimental results clearly indicate that the proposed approach is scalable and provides optimal resource utilization.