Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ajmal Mian

A Comprehensive Overview of Large Language Models

Jul 12, 2023

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Nick Barnes, Ajmal Mian

Figure 1 for A Comprehensive Overview of Large Language Models

Figure 2 for A Comprehensive Overview of Large Language Models

Figure 3 for A Comprehensive Overview of Large Language Models

Figure 4 for A Comprehensive Overview of Large Language Models

Abstract:Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.

Via

Access Paper or Ask Questions

UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

Jul 03, 2023

Muhammad Ibrahim, Naveed Akhtar, Saeed Anwar, Ajmal Mian

Figure 1 for UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

Figure 2 for UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

Figure 3 for UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

Figure 4 for UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

Abstract:Localization is a fundamental task in robotics for autonomous navigation. Existing localization methods rely on a single input data modality or train several computational models to process different modalities. This leads to stringent computational requirements and sub-optimal results that fail to capitalize on the complementary information in other data streams. This paper proposes UnLoc, a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions. Our multi-stream network can handle LiDAR, Camera and RADAR inputs for localization on demand, i.e., it can work with one or more input sensors, making it robust to sensor failure. UnLoc uses 3D sparse convolutions and cylindrical partitioning of the space to process LiDAR frames and implements ResNet blocks with a slot attention-based feature filtering module for the Radar and image modalities. We introduce a unique learnable modality encoding scheme to distinguish between the input sensor data. Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets. The results ascertain the efficacy of our technique.

* UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input has been accepted for publication in the Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

Via

Access Paper or Ask Questions

Human Gesture and Gait Analysis for Autism Detection

Apr 17, 2023

Sania Zahan, Zulqarnain Gilani, Ghulam Mubashar Hassan, Ajmal Mian

Abstract:Autism diagnosis presents a major challenge due to the vast heterogeneity of the condition and the elusive nature of early detection. Atypical gait and gesture patterns are dominant behavioral characteristics of autism and can provide crucial insights for diagnosis. Furthermore, these data can be collected efficiently in a non-intrusive way, facilitating early intervention to optimize positive outcomes. Existing research mainly focuses on associating facial and eye-gaze features with autism. However, very few studies have investigated movement and gesture patterns which can reveal subtle variations and characteristics that are specific to autism. To address this gap, we present an analysis of gesture and gait activity in videos to identify children with autism and quantify the severity of their condition by regressing autism diagnostic observation schedule scores. Our proposed architecture addresses two key factors: (1) an effective feature representation to manifest irregular gesture patterns and (2) a two-stream co-learning framework to enable a comprehensive understanding of its relation to autism from diverse perspectives without explicitly using additional data modality. Experimental results demonstrate the efficacy of utilizing gesture and gait-activity videos for autism analysis.

* Accepted for publication at FGAHI@CVPR2023

Via

Access Paper or Ask Questions

DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

Mar 08, 2023

Yong He, Hongshan Yu, Zhengeng Yang, Wei Sun, Mingtao Feng, Ajmal Mian

Figure 1 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

Figure 2 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

Figure 3 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

Figure 4 for DANet: Density Adaptive Convolutional Network with Interactive Attention for 3D Point Clouds

Abstract:Local features and contextual dependencies are crucial for 3D point cloud analysis. Many works have been devoted to designing better local convolutional kernels that exploit the contextual dependencies. However, current point convolutions lack robustness to varying point cloud density. Moreover, contextual modeling is dominated by non-local or self-attention models which are computationally expensive. To solve these problems, we propose density adaptive convolution, coined DAConv. The key idea is to adaptively learn the convolutional weights from geometric connections obtained from the point density and position. To extract precise context dependencies with fewer computations, we propose an interactive attention module (IAM) that embeds spatial information into channel attention along different spatial directions. DAConv and IAM are integrated in a hierarchical network architecture to achieve local density and contextual direction-aware learning for point cloud analysis. Experiments show that DAConv is significantly more robust to point density compared to existing methods and extensive comparisons on challenging 3D point cloud datasets show that our network achieves state-of-the-art classification results of 93.6% on ModelNet40, competitive semantic segmentation results of 68.71% mIoU on S3DIS and part segmentation results of 86.7% mIoU on ShapeNet.

* 9

Via

Access Paper or Ask Questions

Full Point Encoding for Local Feature Aggregation in 3D Point Clouds

Mar 08, 2023

Yong He, Hongshan Yu, Zhengeng Yang, Xiaoyan Liu, Wei Sun, Ajmal Mian

Figure 1 for Full Point Encoding for Local Feature Aggregation in 3D Point Clouds

Figure 2 for Full Point Encoding for Local Feature Aggregation in 3D Point Clouds

Figure 3 for Full Point Encoding for Local Feature Aggregation in 3D Point Clouds

Figure 4 for Full Point Encoding for Local Feature Aggregation in 3D Point Clouds

Abstract:Point cloud processing methods exploit local point features and global context through aggregation which does not explicity model the internal correlations between local and global features. To address this problem, we propose full point encoding which is applicable to convolution and transformer architectures. Specifically, we propose Full Point Convolution (FPConv) and Full Point Transformer (FPTransformer) architectures. The key idea is to adaptively learn the weights from local and global geometric connections, where the connections are established through local and global correlation functions respectively. FPConv and FPTransformer simultaneously model the local and global geometric relationships as well as their internal correlations, demonstrating strong generalization ability and high performance. FPConv is incorporated in classical hierarchical network architectures to achieve local and global shape-aware learning. In FPTransformer, we introduce full point position encoding in self-attention, that hierarchically encodes each point position in the global and local receptive field. We also propose a shape aware downsampling block which takes into account the local shape and the global context. Experimental comparison to existing methods on benchmark datasets show the efficacy of FPConv and FPTransformer for semantic segmentation, object detection, classification, and normal estimation tasks. In particular, we achieve state-of-the-art semantic segmentation results of 76% mIoU on S3DIS 6-fold and 72.2% on S3DIS Area5.

* 15

Via

Access Paper or Ask Questions

Q-Cogni: An Integrated Causal Reinforcement Learning Framework

Feb 26, 2023

Cris Cunha, Wei Liu, Tim French, Ajmal Mian

Figure 1 for Q-Cogni: An Integrated Causal Reinforcement Learning Framework

Figure 2 for Q-Cogni: An Integrated Causal Reinforcement Learning Framework

Figure 3 for Q-Cogni: An Integrated Causal Reinforcement Learning Framework

Figure 4 for Q-Cogni: An Integrated Causal Reinforcement Learning Framework

Abstract:We present Q-Cogni, an algorithmically integrated causal reinforcement learning framework that redesigns Q-Learning with an autonomous causal structure discovery method to improve the learning process with causal inference. Q-Cogni achieves optimal learning with a pre-learned structural causal model of the environment that can be queried during the learning process to infer cause-and-effect relationships embedded in a state-action space. We leverage on the sample efficient techniques of reinforcement learning, enable reasoning about a broader set of policies and bring higher degrees of interpretability to decisions made by the reinforcement learning agent. We apply Q-Cogni on the Vehicle Routing Problem (VRP) and compare against state-of-the-art reinforcement learning algorithms. We report results that demonstrate better policies, improved learning efficiency and superior interpretability of the agent's decision making. We also compare this approach with traditional shortest-path search algorithms and demonstrate the benefits of our causal reinforcement learning framework to high dimensional problems. Finally, we apply Q-Cogni to derive optimal routing decisions for taxis in New York City using the Taxi & Limousine Commission trip record data and compare with shortest-path search, reporting results that show 85% of the cases with an equal or better policy derived from Q-Cogni in a real-world domain.

* 9 pages, 10 figures, 2 algorithms

Via

Access Paper or Ask Questions

Slice Transformer and Self-supervised Learning for 6DoF Localization in 3D Point Cloud Maps

Jan 21, 2023

Muhammad Ibrahim, Naveed Akhtar, Saeed Anwar, Michael Wise, Ajmal Mian

Abstract:Precise localization is critical for autonomous vehicles. We present a self-supervised learning method that employs Transformers for the first time for the task of outdoor localization using LiDAR data. We propose a pre-text task that reorganizes the slices of a $360^\circ$ LiDAR scan to leverage its axial properties. Our model, called Slice Transformer, employs multi-head attention while systematically processing the slices. To the best of our knowledge, this is the first instance of leveraging multi-head attention for outdoor point clouds. We additionally introduce the Perth-WA dataset, which provides a large-scale LiDAR map of Perth city in Western Australia, covering $\sim$4km$^2$ area. Localization annotations are provided for Perth-WA. The proposed localization method is thoroughly evaluated on Perth-WA and Appollo-SouthBay datasets. We also establish the efficacy of our self-supervised learning approach for the common downstream task of object classification using ModelNet40 and ScanNN datasets. The code and Perth-WA data will be publicly released.

* Accepted in IEEE International Conference on Robotics and Automation (ICRA), 2023

Via

Access Paper or Ask Questions

Learning Sparse Temporal Video Mapping for Action Quality Assessment in Floor Gymnastics

Jan 15, 2023

Sania Zahan, Ghulam Mubashar Hassan, Ajmal Mian

Abstract:Athlete performance measurement in sports videos requires modeling long sequences since the entire spatio-temporal progression contributes dominantly to the performance. It is crucial to comprehend local discriminative spatial dependencies and global semantics for accurate evaluation. However, existing benchmark datasets mainly incorporate sports where the performance lasts only a few seconds. Consequently, state-ofthe-art sports quality assessment methods specifically focus on spatial structure. Although they achieve high performance in short-term sports, they are unable to model prolonged video sequences and fail to achieve similar performance in long-term sports. To facilitate such analysis, we introduce a new dataset, coined AGF-Olympics, that incorporates artistic gymnastic floor routines. AFG-Olympics provides highly challenging scenarios with extensive background, viewpoint, and scale variations over an extended sample duration of up to 2 minutes. In addition, we propose a discriminative attention module to map the dense feature space into a sparse representation by disentangling complex associations. Extensive experiments indicate that our proposed module provides an effective way to embed long-range spatial and temporal correlation semantics.

Via

Access Paper or Ask Questions

Fast Parallel Bayesian Network Structure Learning

Dec 08, 2022

Jiantong Jiang, Zeyi Wen, Ajmal Mian

Abstract:Bayesian networks (BNs) are a widely used graphical model in machine learning for representing knowledge with uncertainty. The mainstream BN structure learning methods require performing a large number of conditional independence (CI) tests. The learning process is very time-consuming, especially for high-dimensional problems, which hinders the adoption of BNs to more applications. Existing works attempt to accelerate the learning process with parallelism, but face issues including load unbalancing, costly atomic operations and dominant parallel overhead. In this paper, we propose a fast solution named Fast-BNS on multi-core CPUs to enhance the efficiency of the BN structure learning. Fast-BNS is powered by a series of efficiency optimizations including (i) designing a dynamic work pool to monitor the processing of edges and to better schedule the workloads among threads, (ii) grouping the CI tests of the edges with the same endpoints to reduce the number of unnecessary CI tests, (iii) using a cache-friendly data storage to improve the memory efficiency, and (iv) generating the conditioning sets on-the-fly to avoid extra memory consumption. A comprehensive experimental study shows that the sequential version of Fast-BNS is up to 50 times faster than its counterpart, and the parallel version of Fast-BNS achieves 4.8 to 24.5 times speedup over the state-of-the-art multi-threaded solution. Moreover, Fast-BNS has a good scalability to the network size as well as sample size. Fast-BNS source code is freely available at https://github.com/jjiantong/FastBN.

Via

Access Paper or Ask Questions

Fast Parallel Exact Inference on Bayesian Networks: Poster

Dec 08, 2022

Jiantong Jiang, Zeyi Wen, Atif Mansoor, Ajmal Mian

Abstract:Bayesian networks (BNs) are attractive, because they are graphical and interpretable machine learning models. However, exact inference on BNs is time-consuming, especially for complex problems. To improve the efficiency, we propose a fast BN exact inference solution named Fast-BNI on multi-core CPUs. Fast-BNI enhances the efficiency of exact inference through hybrid parallelism that tightly integrates coarse- and fine-grained parallelism. We also propose techniques to further simplify the bottleneck operations of BN exact inference. Fast-BNI source code is freely available at https://github.com/jjiantong/FastBN.

Via

Access Paper or Ask Questions