Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. However, this is challenging due to the difficulty of combining multi-granularity geometric and semantic features from two drastically different modalities. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space for fusion, and they can be roughly divided into 1) early fusion of raw points that aims at augmenting the 3D point cloud at the early input stage, and 2) late fusion of BEV (bird-eye view) maps that merges LiDAR and camera BEV features before the detection head. While both have their merits in enhancing the representation power of the combined features, this single-level fusion strategy is a suboptimal solution to the aforementioned challenge. Their major drawbacks are the inability to interact the multi-granularity semantic features from two distinct modalities sufficiently. To this end, we propose a novel framework that focuses on the multi-scale progressive interaction of the multi-granularity LiDAR and camera features. Our proposed method, abbreviated as MDMSFusion, achieves state-of-the-art results in 3D object detection, with 69.1 mAP and 71.8 NDS on nuScenes validation set, and 70.8 mAP and 73.2 NDS on nuScenes test set, which rank 1st and 2nd respectively among single-model non-ensemble approaches by the time of submission.
We present an on-device real-time hand gesture recognition (HGR) system, which detects a set of predefined static gestures from a single RGB camera. The system consists of two parts: a hand skeleton tracker and a gesture classifier. We use MediaPipe Hands as the basis of the hand skeleton tracker, improve the keypoint accuracy, and add the estimation of 3D keypoints in a world metric space. We create two different gesture classifiers, one based on heuristics and the other using neural networks (NN).
Lower bounds on the mean square error (MSE) play an important role in evaluating the estimation performance of nonlinear parameters including direction-of-arrival (DOA). Among numerous known bounds, the well-accepted Cramer-Rao bound (CRB) lower bounds the MSE in the asymptotic region only, due to its locality. By contrast, the less-adopted Ziv-Zakai bound (ZZB) is restricted by the single source assumption, although it is global tight. In this paper, we first derive an explicit ZZB applicable for hybrid coherent/incoherent multiple sources DOA estimation. In detail, we incorporate Woodbury matrix identity and Sylvester's determinant theorem to generalize the ZZB from single source DOA estimation to multiple sources DOA estimation, which, unfortunately, becomes invalid when it is far away from the asymptotic region. We then introduce the order statistics to describe the effect of ordering process during MSE calculation on the change of a priori distribution of DOAs, such that the derived ZZB can keep a tight bound on the MSE outside the asymptotic region. The derived ZZB is for the first time formulated as the function of the coherent coefficients between the coherent sources, and reveals the relationship between the MSE convergency in the a priori performance region and the number of sources. Moreover, the derived ZZB also provides a unified tight bound for both overdetermined DOAs estimation and underdetermined DOAs estimation. Simulation results demonstrate the obvious advantages of the derived ZZB over the CRB on evaluating and predicting the estimation performance of multiple sources DOA.
Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of retinal layer thicknesses and the retinal layer deformation due to the presence of the fluid are highly correlated with multiple epidemic eye diseases like Diabetic Retinopathy(DR) and Age-related Macular Degeneration (AMD). However, these images are acquired from different devices, which have different intensity distribution, or in other words, belong to different imaging domains. This paper proposes a segmentation-guided domain-adaptation method to adapt images from multiple devices into single image domain, where the state-of-art pre-trained segmentation model is available. It avoids the time consumption of manual labelling for the upcoming new dataset and the re-training of the existing network. The semantic consistency and global feature consistency of the network will minimize the hallucination effect that many researchers reported regarding Cycle-Consistent Generative Adversarial Networks(CycleGAN) architecture.
Early and precise diagnosis of diseases in plants can help to develop an early treatment technique. Plant diseases degrade both the quantity and quality of crops, thus posing a threat to food security and resulting in huge economic losses. Traditionally identification is performed manually, which is inaccurate, time-consuming, and expensive. This paper presents a simple and efficient model to detect grapes leaf diseases using transfer learning. A pre-trained deep convolutional neural network is used as a feature extractor and random forest as a classifier. The performance of the model is interpreted in terms of accuracy, precision, recall, and f1 score. Total 1003 images of four different classes are used and 91.66% accuracy is obtained.
Providing security in the transmission of images and other multimedia data has become one of the most important scientific and practical issues. In this paper, a method for compressing and encryption images is proposed, which can safely transmit images in low-bandwidth data transmission channels. At first, using the autoencoding generative adversarial network (AEGAN) model, the images are mapped to a vector in the latent space with low dimensions. In the next step, the obtained vector is encrypted using public key encryption methods. In the proposed method, Henon chaotic map is used for permutation, which makes information transfer more secure. To evaluate the results of the proposed scheme, three criteria SSIM, PSNR, and execution time have been used.
Machine learning has opened up new tools for financial fraud detection. Using a sample of annotated transactions, a machine learning classification algorithm learns to detect frauds. With growing credit card transaction volumes and rising fraud percentages there is growing interest in finding appropriate machine learning classifiers for detection. However, fraud data sets are diverse and exhibit inconsistent characteristics. As a result, a model effective on a given data set is not guaranteed to perform on another. Further, the possibility of temporal drift in data patterns and characteristics over time is high. Additionally, fraud data has massive and varying imbalance. In this work, we evaluate sampling methods as a viable pre-processing mechanism to handle imbalance and propose a data-driven classifier selection strategy for characteristic highly imbalanced fraud detection data sets. The model derived based on our selection strategy surpasses peer models, whilst working in more realistic conditions, establishing the effectiveness of the strategy.
As a core part of autonomous driving systems, motion planning has received extensive attention from academia and industry. However, there is no efficient trajectory planning solution capable of spatial-temporal joint optimization due to nonholonomic dynamics, particularly in the presence of unstructured environments and dynamic obstacles. To bridge the gap, we propose a versatile and real-time trajectory optimization method that can generate a high-quality feasible trajectory using a full vehicle model under arbitrary constraints. By leveraging the differential flatness property of car-like robots, we use flat outputs to analytically formulate all feasibility constraints to simplify the trajectory planning problem. Moreover, obstacle avoidance is achieved with full dimensional polygons to generate less conservative trajectories with safety guarantees, especially in tightly constrained spaces. We present comprehensive benchmarks with cutting-edge methods, demonstrating the significance of the proposed method in terms of efficiency and trajectory quality. Real-world experiments verify the practicality of our algorithm. We will release our codes as open-source packages with the purpose for the reference of the research community.
Although previous graph-based multi-view clustering algorithms have gained significant progress, most of them are still faced with three limitations. First, they often suffer from high computational complexity, which restricts their applications in large-scale scenarios. Second, they usually perform graph learning either at the single-view level or at the view-consensus level, but often neglect the possibility of the joint learning of single-view and consensus graphs. Third, many of them rely on the $k$-means for discretization of the spectral embeddings, which lack the ability to directly learn the graph with discrete cluster structure. In light of this, this paper presents an efficient multi-view clustering approach via unified and discrete bipartite graph learning (UDBGL). Specifically, the anchor-based subspace learning is incorporated to learn the view-specific bipartite graphs from multiple views, upon which the bipartite graph fusion is leveraged to learn a view-consensus bipartite graph with adaptive weight learning. Further, the Laplacian rank constraint is imposed to ensure that the fused bipartite graph has discrete cluster structures (with a specific number of connected components). By simultaneously formulating the view-specific bipartite graph learning, the view-consensus bipartite graph learning, and the discrete cluster structure learning into a unified objective function, an efficient minimization algorithm is then designed to tackle this optimization problem and directly achieve a discrete clustering solution without requiring additional partitioning, which notably has linear time complexity in data size. Experiments on a variety of multi-view datasets demonstrate the robustness and efficiency of our UDBGL approach.
Fully-supervised salient object detection (SOD) methods have made great progress, but such methods often rely on a large number of pixel-level annotations, which are time-consuming and labour-intensive. In this paper, we focus on a new weakly-supervised SOD task under hybrid labels, where the supervision labels include a large number of coarse labels generated by the traditional unsupervised method and a small number of real labels. To address the issues of label noise and quantity imbalance in this task, we design a new pipeline framework with three sophisticated training strategies. In terms of model framework, we decouple the task into label refinement sub-task and salient object detection sub-task, which cooperate with each other and train alternately. Specifically, the R-Net is designed as a two-stream encoder-decoder model equipped with Blender with Guidance and Aggregation Mechanisms (BGA), aiming to rectify the coarse labels for more reliable pseudo-labels, while the S-Net is a replaceable SOD network supervised by the pseudo labels generated by the current R-Net. Note that, we only need to use the trained S-Net for testing. Moreover, in order to guarantee the effectiveness and efficiency of network training, we design three training strategies, including alternate iteration mechanism, group-wise incremental mechanism, and credibility verification mechanism. Experiments on five SOD benchmarks show that our method achieves competitive performance against weakly-supervised/unsupervised methods both qualitatively and quantitatively.