Abstract:Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate safely in dynamic environments. We propose a novel navigation approach that utilizes a foundation model to transform floor plans into navigable knowledge graphs and generate human-readable navigation instructions. Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts, reducing the manual preprocessing required by earlier floorplan parsing methods. Experimental results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations. Claude 3.7 Sonnet achieves the highest accuracy among the evaluated models, with 92.31%, 76.92%, and 61.54% on the short, medium, and long routes, respectively, under 5-shot prompting of the MP-1 floor plan. The success rate of graph-based spatial structure is 15.4% higher than that of direct visual reasoning among all models, which confirms that graphical representation and in-context learning enhance navigation performance and make our solution more precise for indoor navigation of Blind and Low Vision (BLV) users.




Abstract:Graph autoencoders are very efficient at embedding graph-based complex data sets. However, most of the autoencoders have shallow depths and their efficiency tends to decrease with the increase of layer depth. In this paper, we study the effect of adding residual connections to shallow and deep graph variational and vanilla autoencoders. We show that residual connections improve the accuracy of the deep graph-based autoencoders. Furthermore, we propose Res-VGAE, a graph variational autoencoder with different residual connections. Our experiments show that our model achieves superior results when compared with other autoencoder-based models for the link prediction task.




Abstract:We propose a novel knowledge distillation methodology for compressing deep neural networks. One of the most efficient methods for knowledge distillation is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, there is no systematic approach for selecting them, other than brute-force hyper-parameter search. We propose a clustering based hint selection methodology, where the layers of teacher model are clustered with respect to several metrics and the cluster centers are used as the hint points. The proposed approach is validated in CIFAR-100 dataset, where ResNet-110 network was used as the teacher model. Our results show that hint points selected by our algorithm results in superior compression performance with respect to state-of-the-art knowledge distillation algorithms on the same student models and datasets.

Abstract:This paper presents a novel and uniform algorithm for edge detection based on SVM (support vector machine) with Three-dimensional Gaussian radial basis function with kernel. Because of disadvantages in traditional edge detection such as inaccurate edge location, rough edge and careless on detect soft edge. The experimental results indicate how the SVM can detect edge in efficient way. The performance of the proposed algorithm is compared with existing methods, including Sobel and canny detectors. The results show that this method is better than classical algorithm such as canny and Sobel detector.