Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

May 25, 2023
Wenhao Cheng, Junbo Yin, Wei Li, Ruigang Yang, Jianbing Shen

Figure 1 for Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

Figure 2 for Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

Figure 3 for Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

Figure 4 for Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

This paper addresses the problem of 3D referring expression comprehension (REC) in autonomous driving scenario, which aims to ground a natural language to the targeted region in LiDAR point clouds. Previous approaches for REC usually focus on the 2D or 3D-indoor domain, which is not suitable for accurately predicting the location of the queried 3D region in an autonomous driving scene. In addition, the upper-bound limitation and the heavy computation cost motivate us to explore a better solution. In this work, we propose a new multi-modal visual grounding task, termed LiDAR Grounding. Then we devise a Multi-modal Single Shot Grounding (MSSG) approach with an effective token fusion strategy. It jointly learns the LiDAR-based object detector with the language features and predicts the targeted region directly from the detector without any post-processing. Moreover, the image feature can be flexibly integrated into our approach to provide rich texture and color information. The cross-modal learning enforces the detector to concentrate on important regions in the point cloud by considering the informative language expressions, thus leading to much better accuracy and efficiency. Extensive experiments on the Talk2Car dataset demonstrate the effectiveness of the proposed methods. Our work offers a deeper insight into the LiDAR-based grounding task and we expect it presents a promising direction for the autonomous driving community.

Via

Access Paper or Ask Questions

Few Shot Learning for Medical Imaging: A Comparative Analysis of Methodologies and Formal Mathematical Framework

May 08, 2023
Jannatul Nayem, Sayed Sahriar Hasan, Noshin Amina, Bristy Das, Md Shahin Ali, Md Manjurul Ahsan, Shivakumar Raman

Figure 1 for Few Shot Learning for Medical Imaging: A Comparative Analysis of Methodologies and Formal Mathematical Framework

Figure 2 for Few Shot Learning for Medical Imaging: A Comparative Analysis of Methodologies and Formal Mathematical Framework

Figure 3 for Few Shot Learning for Medical Imaging: A Comparative Analysis of Methodologies and Formal Mathematical Framework

Figure 4 for Few Shot Learning for Medical Imaging: A Comparative Analysis of Methodologies and Formal Mathematical Framework

Deep learning becomes an elevated context regarding disposing of many machine learning tasks and has shown a breakthrough upliftment to extract features from unstructured data. Though this flourishing context is developing in the medical image processing sector, scarcity of problem-dependent training data has become a larger issue in the way of easy application of deep learning in the medical sector. To unravel the confined data source, researchers have developed a model that can solve machine learning problems with fewer data called ``Few shot learning". Few hot learning algorithms determine to solve the data limitation problems by extracting the characteristics from a small dataset through classification and segmentation methods. In the medical sector, there is frequently a shortage of available datasets in respect of some confidential diseases. Therefore, Few shot learning gets the limelight in this data scarcity sector. In this chapter, the background and basic overview of a few shots of learning is represented. Henceforth, the classification of few-shot learning is described also. Even the paper shows a comparison of methodological approaches that are applied in medical image analysis over time. The current advancement in the implementation of few-shot learning concerning medical imaging is illustrated. The future scope of this domain in the medical imaging sector is further described.

Via

Access Paper or Ask Questions

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

May 08, 2023
Yunxin Li, Baotian Hu, Xinyu Chen, Yuxin Ding, Lin Ma, Min Zhang

Figure 1 for A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Figure 2 for A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Figure 3 for A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Figure 4 for A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Conditional inference on joint textual and visual clues is a multi-modal reasoning task that textual clues provide prior permutation or external knowledge, which are complementary with visual content and pivotal to deducing the correct option. Previous methods utilizing pretrained vision-language models (VLMs) have achieved impressive performances, yet they show a lack of multimodal context reasoning capability, especially for text-modal information. To address this issue, we propose a Multi-modal Context Reasoning approach, named ModCR. Compared to VLMs performing reasoning via cross modal semantic alignment, it regards the given textual abstract semantic and objective image information as the pre-context information and embeds them into the language model to perform context reasoning. Different from recent vision-aided language models used in natural language processing, ModCR incorporates the multi-view semantic alignment information between language and vision by introducing the learnable alignment prefix between image and text in the pretrained language model. This makes the language model well-suitable for such multi-modal reasoning scenario on joint textual and visual clues. We conduct extensive experiments on two corresponding data sets and experimental results show significantly improved performance (exact gain by 4.8% on PMR test set) compared to previous strong baselines. Code Link: \url{https://github.com/YunxinLi/Multimodal-Context-Reasoning}.

* Accepted to ACL 2023 Main Conference

Via

Access Paper or Ask Questions

Optimized Table Tokenization for Table Structure Recognition

May 05, 2023
Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Peter Staar

Figure 1 for Optimized Table Tokenization for Table Structure Recognition

Figure 2 for Optimized Table Tokenization for Table Structure Recognition

Figure 3 for Optimized Table Tokenization for Table Structure Recognition

Figure 4 for Optimized Table Tokenization for Table Structure Recognition

Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) approaches. Taking only the image of a table, such models predict a sequence of tokens (e.g. in HTML, LaTeX) which represent the structure of the table. Since the token representation of the table structure has a significant impact on the accuracy and run-time performance of any Im2Seq model, we investigate in this paper how table-structure representation can be optimised. We propose a new, optimised table-structure language (OTSL) with a minimized vocabulary and specific rules. The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average. Consequently, model accuracy improves significantly, inference time is halved compared to HTML-based models, and the predicted table structures are always syntactically correct. This in turn eliminates most post-processing needs.

* Accepted to ICDAR 2023, 12 pages, 6 figures

Via

Access Paper or Ask Questions

Evaluating LeNet Algorithms in Classification Lung Cancer from Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases

May 19, 2023
Jafar Abdollahi

Figure 1 for Evaluating LeNet Algorithms in Classification Lung Cancer from Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases

Figure 2 for Evaluating LeNet Algorithms in Classification Lung Cancer from Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases

Figure 3 for Evaluating LeNet Algorithms in Classification Lung Cancer from Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases

Figure 4 for Evaluating LeNet Algorithms in Classification Lung Cancer from Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases

The advancement of computer-aided detection systems had a significant impact on clinical analysis and decision-making on human disease. Lung cancer requires more attention among the numerous diseases being examined because it affects both men and women, increasing the mortality rate. LeNet, a deep learning model, is used in this study to detect lung tumors. The studies were run on a publicly available dataset made up of CT image data (IQ-OTH/NCCD). Convolutional neural networks (CNNs) were employed in the experiment for feature extraction and classification. The proposed system was evaluated on Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases datasets the success percentage was calculated as 99.51%, sensitivity (93%) and specificity (95%), and better results were obtained compared to the existing methods. Development and validation of algorithms such as ours are important initial steps in the development of software suites that could be adopted in routine pathological practices and potentially help reduce the burden on pathologists.

* arXiv admin note: text overlap with arXiv:2106.11342 by other authors

Via

Access Paper or Ask Questions

PyTorch Hyperparameter Tuning -- A Tutorial for spotPython

May 19, 2023
Thomas Bartz-Beielstein

Figure 1 for PyTorch Hyperparameter Tuning -- A Tutorial for spotPython

Figure 2 for PyTorch Hyperparameter Tuning -- A Tutorial for spotPython

Figure 3 for PyTorch Hyperparameter Tuning -- A Tutorial for spotPython

Figure 4 for PyTorch Hyperparameter Tuning -- A Tutorial for spotPython

The goal of hyperparameter tuning (or hyperparameter optimization) is to optimize the hyperparameters to improve the performance of the machine or deep learning model. spotPython (``Sequential Parameter Optimization Toolbox in Python'') is the Python version of the well-known hyperparameter tuner SPOT, which has been developed in the R programming environment for statistical analysis for over a decade. PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. This document shows how to integrate the spotPython hyperparameter tuner into the PyTorch training workflow. As an example, the results of the CIFAR10 image classifier are used. In addition to an introduction to spotPython, this tutorial also includes a brief comparison with Ray Tune, a Python library for running experiments and tuning hyperparameters. This comparison is based on the PyTorch hyperparameter tuning tutorial. The advantages and disadvantages of both approaches are discussed. We show that spotPython achieves similar or even better results while being more flexible and transparent than Ray Tune.

* Refers to spotPython version 0.1.3

Via

Access Paper or Ask Questions

SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification

May 18, 2023
Mauricio Mendez-Ruiz, Jorge Gonzalez-Zapata, Ivan Reyes-Amezcua, Daniel Flores-Araiza, Francisco Lopez-Tiro, Andres Mendez-Vazquez, Gilberto Ochoa-Ruiz

Figure 1 for SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification

Figure 2 for SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification

Figure 3 for SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification

Figure 4 for SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification

Few-shot learning is a challenging area of research that aims to learn new concepts with only a few labeled samples of data. Recent works based on metric-learning approaches leverage the meta-learning approach, which is encompassed by episodic tasks that make use a support (training) and query set (test) with the objective of learning a similarity comparison metric between those sets. Due to the lack of data, the learning process of the embedding network becomes an important part of the few-shot task. Previous works have addressed this problem using metric learning approaches, but the properties of the underlying latent space and the separability of the difference classes on it was not entirely enforced. In this work, we propose two different loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data. The first loss function is the Proto-Triplet Loss, which is based on the original triplet loss with the modifications needed to better work on few-shot scenarios. The second loss function, which we dub ICNN loss is based on an inter and intra class nearest neighbors score, which help us to assess the quality of embeddings obtained from the trained network. Our results, obtained from a extensive experimental setup show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%, demonstrating the capability of these loss functions to allow the network to generalize better to previously unseen classes. In our experiments, we demonstrate competitive generalization capabilities to other domains, such as the Caltech CUB, Dogs and Cars datasets compared with the state of the art.

* Paper submitted to a journal for publication

Via

Access Paper or Ask Questions

Helping Visually Impaired People Take Better Quality Pictures

May 14, 2023
Maniratnam Mandal, Deepti Ghadiyaram, Danna Gurari, Alan C. Bovik

Figure 1 for Helping Visually Impaired People Take Better Quality Pictures

Figure 2 for Helping Visually Impaired People Take Better Quality Pictures

Figure 3 for Helping Visually Impaired People Take Better Quality Pictures

Figure 4 for Helping Visually Impaired People Take Better Quality Pictures

Perception-based image analysis technologies can be used to help visually impaired people take better quality pictures by providing automated guidance, thereby empowering them to interact more confidently on social media. The photographs taken by visually impaired users often suffer from one or both of two kinds of quality issues: technical quality (distortions), and semantic quality, such as framing and aesthetic composition. Here we develop tools to help them minimize occurrences of common technical distortions, such as blur, poor exposure, and noise. We do not address the complementary problems of semantic quality, leaving that aspect for future work. The problem of assessing and providing actionable feedback on the technical quality of pictures captured by visually impaired users is hard enough, owing to the severe, commingled distortions that often occur. To advance progress on the problem of analyzing and measuring the technical quality of visually impaired user-generated content (VI-UGC), we built a very large and unique subjective image quality and distortion dataset. This new perceptual resource, which we call the LIVE-Meta VI-UGC Database, contains $40$K real-world distorted VI-UGC images and $40$K patches, on which we recorded $2.7$M human perceptual quality judgments and $2.7$M distortion labels. Using this psychometric resource we also created an automatic blind picture quality and distortion predictor that learns local-to-global spatial quality relationships, achieving state-of-the-art prediction performance on VI-UGC pictures, significantly outperforming existing picture quality models on this unique class of distorted picture data. We also created a prototype feedback system that helps to guide users to mitigate quality issues and take better quality pictures, by creating a multi-task learning framework.

Via

Access Paper or Ask Questions

DeepSeaNet: Improving Underwater Object Detection using EfficientDet

May 26, 2023
Sanyam Jain

Figure 1 for DeepSeaNet: Improving Underwater Object Detection using EfficientDet

Figure 2 for DeepSeaNet: Improving Underwater Object Detection using EfficientDet

Figure 3 for DeepSeaNet: Improving Underwater Object Detection using EfficientDet

Figure 4 for DeepSeaNet: Improving Underwater Object Detection using EfficientDet

Marine animals and deep underwater objects are difficult to recognize and monitor for safety of aquatic life. There is an increasing challenge when the water is saline with granular particles and impurities. In such natural adversarial environment, traditional approaches like CNN start to fail and are expensive to compute. This project involves implementing and evaluating various object detection models, including EfficientDet, YOLOv5, YOLOv8, and Detectron2, on an existing annotated underwater dataset, called the Brackish-Dataset. The dataset comprises annotated image sequences of fish, crabs, starfish, and other aquatic animals captured in Limfjorden water with limited visibility. The aim of this research project is to study the efficiency of newer models on the same dataset and contrast them with the previous results based on accuracy and inference time. Firstly, I compare the results of YOLOv3 (31.10% mean Average Precision (mAP)), YOLOv4 (83.72% mAP), YOLOv5 (97.6%), YOLOv8 (98.20%), EfficientDet (98.56% mAP) and Detectron2 (95.20% mAP) on the same dataset. Secondly, I provide a modified BiSkFPN mechanism (BiFPN neck with skip connections) to perform complex feature fusion in adversarial noise which makes modified EfficientDet robust to perturbations. Third, analyzed the effect on accuracy of EfficientDet (98.63% mAP) and YOLOv5 by adversarial learning (98.04% mAP). Last, I provide class activation map based explanations (CAM) for the two models to promote Explainability in black box models. Overall, the results indicate that modified EfficientDet achieved higher accuracy with five-fold cross validation than the other models with 88.54% IoU of feature maps.

Via

Access Paper or Ask Questions

Unleashing the Potential of Unsupervised Deep Outlier Detection through Automated Training Stopping

May 26, 2023
Yihong Huang, Yuang Zhang, Liping Wang, Xuemin Lin

Figure 1 for Unleashing the Potential of Unsupervised Deep Outlier Detection through Automated Training Stopping

Figure 2 for Unleashing the Potential of Unsupervised Deep Outlier Detection through Automated Training Stopping

Figure 3 for Unleashing the Potential of Unsupervised Deep Outlier Detection through Automated Training Stopping

Figure 4 for Unleashing the Potential of Unsupervised Deep Outlier Detection through Automated Training Stopping

Outlier detection (OD) has received continuous research interests due to its wide applications. With the development of deep learning, increasingly deep OD algorithms are proposed. Despite the availability of numerous deep OD models, existing research has reported that the performance of deep models is extremely sensitive to the configuration of hyperparameters (HPs). However, the selection of HPs for deep OD models remains a notoriously difficult task due to the lack of any labels and long list of HPs. In our study. we shed light on an essential factor, training time, that can introduce significant variation in the performance of deep model. Even the performance is stable across other HPs, training time itself can cause a serious HP sensitivity issue. Motivated by this finding, we are dedicated to formulating a strategy to terminate model training at the optimal iteration. Specifically, we propose a novel metric called loss entropy to internally evaluate the model performance during training while an automated training stopping algorithm is devised. To our knowledge, our approach is the first to enable reliable identification of the optimal training iteration during training without requiring any labels. Our experiments on tabular, image datasets show that our approach can be applied to diverse deep models and datasets. It not only enhances the robustness of deep models to their HPs, but also improves the performance and reduces plenty of training time compared to naive training.

Via

Access Paper or Ask Questions