Alert button
Picture for Muhammad Saqib

Muhammad Saqib

Alert button

A Comprehensive Overview of Large Language Models

Jul 12, 2023
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Nick Barnes, Ajmal Mian

Figure 1 for A Comprehensive Overview of Large Language Models
Figure 2 for A Comprehensive Overview of Large Language Models
Figure 3 for A Comprehensive Overview of Large Language Models
Figure 4 for A Comprehensive Overview of Large Language Models

Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.

Viaarxiv icon

Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers

Jan 03, 2023
Chandranath Adak, Tejas Karkera, Soumi Chattopadhyay, Muhammad Saqib

Figure 1 for Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers
Figure 2 for Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers
Figure 3 for Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers
Figure 4 for Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers

Diabetic Retinopathy (DR) is considered one of the primary concerns due to its effect on vision loss among most people with diabetes globally. The severity of DR is mostly comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this paper, we adopt transformer-based learning models to capture the crucial features of retinal images to understand DR severity better. We work with ensembling image transformers, where we adopt four models, namely ViT (Vision Transformer), BEiT (Bidirectional Encoder representation for image Transformer), CaiT (Class-Attention in Image Transformers), and DeiT (Data efficient image Transformers), to infer the degree of DR severity from fundus photographs. For experiments, we used the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.

* 9 pages 
Viaarxiv icon

Deep Analysis of Visual Product Reviews

Jul 19, 2022
Chandranath Adak, Soumi Chattopadhyay, Muhammad Saqib

Figure 1 for Deep Analysis of Visual Product Reviews
Figure 2 for Deep Analysis of Visual Product Reviews
Figure 3 for Deep Analysis of Visual Product Reviews
Figure 4 for Deep Analysis of Visual Product Reviews

With the proliferation of the e-commerce industry, analyzing customer feedback is becoming indispensable to a service provider. In recent days, it can be noticed that customers upload the purchased product images with their review scores. In this paper, we undertake the task of analyzing such visual reviews, which is very new of its kind. In the past, the researchers worked on analyzing language feedback, but here we do not take any assistance from linguistic reviews that may be absent, since a recent trend can be observed where customers prefer to quickly upload the visual feedback instead of typing language feedback. We propose a hierarchical architecture, where the higher-level model engages in product categorization, and the lower-level model pays attention to predicting the review score from a customer-provided product image. We generated a database by procuring real visual product reviews, which was quite challenging. Our architecture obtained some promising results by performing extensive experiments on the employed database. The proposed hierarchical architecture attained a 57.48% performance improvement over the single-level best comparable architecture.

* 7 pages 
Viaarxiv icon

VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Jul 19, 2021
Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, Qinghua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao

Figure 1 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results
Figure 2 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results
Figure 3 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results
Figure 4 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd Counting Challenge (VisDrone-CC2020) in conjunction with the 16th European Conference on Computer Vision (ECCV 2020) to promote the developments in the related fields. The collected dataset is formed by $3,360$ images, including $2,460$ images for training, and $900$ images for testing. Specifically, we manually annotate persons with points in each video frame. There are $14$ algorithms from $15$ institutes submitted to the VisDrone-CC2020 Challenge. We provide a detailed analysis of the evaluation results and conclude the challenge. More information can be found at the website: \url{http://www.aiskyeye.com/}.

* European Conference on Computer Vision. Springer, Cham, 2020: 675-691  
* The method description of A7 Mutil-Scale Aware based SFANet (M-SFANet) is updated and missing references are added 
Viaarxiv icon

Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation

Jul 04, 2021
Rauf Momin, Ali Shan Momin, Khalid Rasheed, Muhammad Saqib

Figure 1 for Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation
Figure 2 for Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation
Figure 3 for Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation
Figure 4 for Recognizing Facial Expressions in the Wild using Multi-Architectural Representations based Ensemble Learning with Distillation

Facial expressions are the most common universal forms of body language. In the past few years, automatic facial expression recognition (FER) has been an active field of research. However, it is still a challenging task due to different uncertainties and complications. Nevertheless, efficiency and performance are yet essential aspects for building robust systems. In this work, we propose two models named EmoXNet and EmoXNetLite. EmoXNet is an ensemble learning technique for learning convoluted facial representations, whereas EmoXNetLite is a distillation technique for transferring the knowledge from our ensemble model to an efficient deep neural network using label-smoothen soft labels to detect expressions effectively in real-time. Both models attained better accuracy level in comparison to the models reported to date. The ensemble model (EmoXNet) attained 85.07% test accuracy on FER-2013 with FER+ annotations and 86.25% test accuracy on Real-world Affective Faces Database (RAF-DB). Whereas, the distilled model (EmoXNetLite) attained 82.07% test accuracy on FER-2013 with FER+ annotations and 81.78% test accuracy on RAF-DB. Results show that our models seem to generalize well on new data and are learned to focus on relevant facial representations for expressions recognition.

* 5 pages, 3 figures, 4 tables 
Viaarxiv icon