Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kazuhiro Hotta

Meijo University

Domain Generalization through Attenuation of Domain-Specific Information

Apr 09, 2025

Reiji Saito, Kazuhiro Hotta

Abstract:In this paper, we propose a new evaluation metric called Domain Independence (DI) and Attenuation of Domain-Specific Information (ADSI) which is specifically designed for domain-generalized semantic segmentation in automotive images. DI measures the presence of domain-specific information: a lower DI value indicates strong domain dependence, while a higher DI value suggests greater domain independence. This makes it roughly where domain-specific information exists and up to which frequency range it is present. As a result, it becomes possible to effectively suppress only the regions in the image that contain domain-specific information, enabling feature extraction independent of the domain. ADSI uses a Butterworth filter to remove the low-frequency components of images that contain inherent domain-specific information such as sensor characteristics and lighting conditions. However, since low-frequency components also contain important information such as color, we should not remove them completely. Thus, a scalar value (ranging from 0 to 1) is multiplied by the low-frequency components to retain essential information. This helps the model learn more domain-independent features. In experiments, GTA5 (synthetic dataset) was used as training images, and a real-world dataset was used for evaluation, and the proposed method outperformed conventional approaches. Similarly, in experiments that the Cityscapes (real-world dataset) was used for training and various environment datasets such as rain and nighttime were used for evaluation, the proposed method demonstrated its robustness under nighttime conditions.

* Accepted by CVPR 2025 Workshops

Via

Access Paper or Ask Questions

Genetic Information Analysis of Age-Related Macular Degeneration Fellow Eye Using Multi-Modal Selective ViT

Sep 17, 2024

Yoichi Furukawa, Satoshi Kamiya, Yoichi Sakurada, Kenji Kashiwagi, Kazuhiro Hotta

Abstract:In recent years, there has been significant development in the analysis of medical data using machine learning. It is believed that the onset of Age-related Macular Degeneration (AMD) is associated with genetic polymorphisms. However, genetic analysis is costly, and artificial intelligence may offer assistance. This paper presents a method that predict the presence of multiple susceptibility genes for AMD using fundus and Optical Coherence Tomography (OCT) images, as well as medical records. Experimental results demonstrate that integrating information from multiple modalities can effectively predict the presence of susceptibility genes with over 80$\%$ accuracy.

Via

Access Paper or Ask Questions

Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

Sep 17, 2024

Kotaro Nagata, Hiromu Ono, Kazuhiro Hotta

Figure 1 for Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

Figure 2 for Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

Figure 3 for Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

Figure 4 for Reducing Catastrophic Forgetting in Online Class Incremental Learning Using Self-Distillation

Abstract:In continual learning, there is a serious problem of catastrophic forgetting, in which previous knowledge is forgotten when a model learns new tasks. Various methods have been proposed to solve this problem. Replay methods which replay data from previous tasks in later training, have shown good accuracy. However, replay methods have a generalizability problem from a limited memory buffer. In this paper, we tried to solve this problem by acquiring transferable knowledge through self-distillation using highly generalizable output in shallow layer as a teacher. Furthermore, when we deal with a large number of classes or challenging data, there is a risk of learning not converging and not experiencing overfitting. Therefore, we attempted to achieve more efficient and thorough learning by prioritizing the storage of easily misclassified samples through a new method of memory update. We confirmed that our proposed method outperformed conventional methods by experiments on CIFAR10, CIFAR100, and MiniimageNet datasets.

* 10 pages, 2 figures

Via

Access Paper or Ask Questions

Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints

Sep 05, 2024

Keisuke Toida, Naoki Kato, Osamu Segawa, Takeshi Nakamura, Kazuhiro Hotta

Figure 1 for Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints

Figure 2 for Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints

Figure 3 for Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints

Figure 4 for Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints

Abstract:We propose a Ground IoU (Gr-IoU) to address the data association problem in multi-object tracking. When tracking objects detected by a camera, it often occurs that the same object is assigned different IDs in consecutive frames, especially when objects are close to each other or overlapping. To address this issue, we introduce Gr-IoU, which takes into account the 3D structure of the scene. Gr-IoU transforms traditional bounding boxes from the image space to the ground plane using the vanishing point geometry. The IoU calculated with these transformed bounding boxes is more sensitive to the front-to-back relationships of objects, thereby improving data association accuracy and reducing ID switches. We evaluated our Gr-IoU method on the MOT17 and MOT20 datasets, which contain diverse tracking scenarios including crowded scenes and sequences with frequent occlusions. Experimental results demonstrated that Gr-IoU outperforms conventional real-time methods without appearance features.

* Accepted for the ECCV 2024 Workshop on Affective Behavior Analysis in-the-wild(ABAW)

Via

Access Paper or Ask Questions

Accuracy Improvement of Cell Image Segmentation Using Feedback Former

Aug 23, 2024

Hinako Mitsuoka, Kazuhiro Hotta

Abstract:Semantic segmentation of microscopy cell images by deep learning is a significant technique. We considered that the Transformers, which have recently outperformed CNNs in image recognition, could also be improved and developed for cell image segmentation. Transformers tend to focus more on contextual information than on detailed information. This tendency leads to a lack of detailed information for segmentation. Therefore, to supplement or reinforce the missing detailed information, we hypothesized that feedback processing in the human visual cortex should be effective. Our proposed Feedback Former is a novel architecture for semantic segmentation, in which Transformers is used as an encoder and has a feedback processing mechanism. Feature maps with detailed information are fed back to the lower layers from near the output of the model to compensate for the lack of detailed information which is the weakness of Transformers and improve the segmentation accuracy. By experiments on three cell image datasets, we confirmed that our method surpasses methods without feedback, demonstrating its superior accuracy in cell image segmentation. Our method achieved higher segmentation accuracy while consuming less computational cost than conventional feedback approaches. Moreover, our method offered superior precision without simply increasing the model size of Transformer encoder, demonstrating higher accuracy with lower computational cost.

* Accepted by ECCV2024 Workshop "Human-inspired Computer Vision (HCV)"

Via

Access Paper or Ask Questions

Growing Deep Neural Network Considering with Similarity between Neurons

Aug 23, 2024

Taigo Sakai, Kazuhiro Hotta

Abstract:Deep learning has excelled in image recognition tasks through neural networks inspired by the human brain. However, the necessity for large models to improve prediction accuracy introduces significant computational demands and extended training times.Conventional methods such as fine-tuning, knowledge distillation, and pruning have the limitations like potential accuracy drops. Drawing inspiration from human neurogenesis, where neuron formation continues into adulthood, we explore a novel approach of progressively increasing neuron numbers in compact models during training phases, thereby managing computational costs effectively. We propose a method that reduces feature extraction biases and neuronal redundancy by introducing constraints based on neuron similarity distributions. This approach not only fosters efficient learning in new neurons but also enhances feature extraction relevancy for given tasks. Results on CIFAR-10 and CIFAR-100 datasets demonstrated accuracy improvement, and our method pays more attention to whole object to be classified in comparison with conventional method through Grad-CAM visualizations. These results suggest that our method's potential to decision-making processes.

Via

Access Paper or Ask Questions

Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes

Aug 22, 2024

Sota Kato, Hinako Mitsuoka, Kazuhiro Hotta

Abstract:There has been a lot of recent research on improving the efficiency of fine-tuning foundation models. In this paper, we propose a novel efficient fine-tuning method that allows the input image size of Segment Anything Model (SAM) to be variable. SAM is a powerful foundational model for image segmentation trained on huge datasets, but it requires fine-tuning to recognize arbitrary classes. The input image size of SAM is fixed at 1024 x 1024, resulting in substantial computational demands during training. Furthermore, the fixed input image size may result in the loss of image information, e.g. due to fixed aspect ratios. To address this problem, we propose Generalized SAM (GSAM). Different from the previous methods, GSAM is the first to apply random cropping during training with SAM, thereby significantly reducing the computational cost of training. Experiments on datasets of various types and various pixel counts have shown that GSAM can train more efficiently than SAM and other fine-tuning methods for SAM, achieving comparable or higher accuracy.

* Accepted by ECCV2024 Workshop "Computational Aspects of Deep Learning (CADL)"

Via

Access Paper or Ask Questions

Lite-HRNet Plus: Fast and Accurate Facial Landmark Detection

Aug 23, 2023

Sota Kato, Kazuhiro Hotta, Yuhki Hatakeyama, Yoshinori Konishi

Figure 1 for Lite-HRNet Plus: Fast and Accurate Facial Landmark Detection

Figure 2 for Lite-HRNet Plus: Fast and Accurate Facial Landmark Detection

Figure 3 for Lite-HRNet Plus: Fast and Accurate Facial Landmark Detection

Figure 4 for Lite-HRNet Plus: Fast and Accurate Facial Landmark Detection

Abstract:Facial landmark detection is an essential technology for driver status tracking and has been in demand for real-time estimations. As a landmark coordinate prediction, heatmap-based methods are known to achieve a high accuracy, and Lite-HRNet can achieve a fast estimation. However, with Lite-HRNet, the problem of a heavy computational cost of the fusion block, which connects feature maps with different resolutions, has yet to be solved. In addition, the strong output module used in HRNetV2 is not applied to Lite-HRNet. Given these problems, we propose a novel architecture called Lite-HRNet Plus. Lite-HRNet Plus achieves two improvements: a novel fusion block based on a channel attention and a novel output module with less computational intensity using multi-resolution feature maps. Through experiments conducted on two facial landmark datasets, we confirmed that Lite-HRNet Plus further improved the accuracy in comparison with conventional methods, and achieved a state-of-the-art accuracy with a computational complexity with the range of 10M FLOPs.

* Accepted at ICIP2023

Via

Access Paper or Ask Questions

Enlarged Large Margin Loss for Imbalanced Classification

Jun 15, 2023

Sota Kato, Kazuhiro Hotta

Abstract:We propose a novel loss function for imbalanced classification. LDAM loss, which minimizes a margin-based generalization bound, is widely utilized for class-imbalanced image classification. Although, by using LDAM loss, it is possible to obtain large margins for the minority classes and small margins for the majority classes, the relevance to a large margin, which is included in the original softmax cross entropy loss, is not be clarified yet. In this study, we reconvert the formula of LDAM loss using the concept of the large margin softmax cross entropy loss based on the softplus function and confirm that LDAM loss includes a wider large margin than softmax cross entropy loss. Furthermore, we propose a novel Enlarged Large Margin (ELM) loss, which can further widen the large margin of LDAM loss. ELM loss utilizes the large margin for the maximum logit of the incorrect class in addition to the basic margin used in LDAM loss. Through experiments conducted on imbalanced CIFAR datasets and large-scale datasets with long-tailed distribution, we confirmed that classification accuracy was much improved compared with LDAM loss and conventional losses for imbalanced classification.

* Accepted at SMC2023

Via

Access Paper or Ask Questions

DeformableFormer: Classification of Endoscopic Ultrasound Guided Fine Needle Biopsy in Pancreatic Diseases

Apr 21, 2023

Taiji Kurami, Takuya Ishikawa, Kazuhiro Hotta

Abstract:Endoscopic Ultrasound-Fine Needle Aspiration (EUS-FNA) is used to examine pancreatic cancer. EUS-FNA is an examination using EUS to insert a thin needle into the tumor and collect pancreatic tissue fragments. Then collected pancreatic tissue fragments are then stained to classify whether they are pancreatic cancer. However, staining and visual inspection are time consuming. In addition, if the pancreatic tissue fragment cannot be examined after staining, the collection must be done again on the other day. Therefore, our purpose is to classify from an unstained image whether it is available for examination or not, and to exceed the accuracy of visual classification by specialist physicians. Image classification before staining can reduce the time required for staining and the burden of patients. However, the images of pancreatic tissue fragments used in this study cannot be successfully classified by processing the entire image because the pancreatic tissue fragments are only a part of the image. Therefore, we propose a DeformableFormer that uses Deformable Convolution in MetaFormer framework. The architecture consists of a generalized model of the Vision Transformer, and we use Deformable Convolution in the TokenMixer part. In contrast to existing approaches, our proposed DeformableFormer is possible to perform feature extraction more locally and dynamically by Deformable Convolution. Therefore, it is possible to perform suitable feature extraction for classifying target. To evaluate our method, we classify two categories of pancreatic tissue fragments; available and unavailable for examination. We demonstrated that our method outperformed the accuracy by specialist physicians and conventional methods.

Via

Access Paper or Ask Questions