Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dimitris Samaras

SUNY

Learning from Pseudo-labeled Segmentation for Multi-Class Object Counting

Jul 15, 2023

Jingyi Xu, Hieu Le, Dimitris Samaras

Abstract:Class-agnostic counting (CAC) has numerous potential applications across various domains. The goal is to count objects of an arbitrary category during testing, based on only a few annotated exemplars. In this paper, we point out that the task of counting objects of interest when there are multiple object classes in the image (namely, multi-class object counting) is particularly challenging for current object counting models. They often greedily count every object regardless of the exemplars. To address this issue, we propose localizing the area containing the objects of interest via an exemplar-based segmentation model before counting them. The key challenge here is the lack of segmentation supervision to train this model. To this end, we propose a method to obtain pseudo segmentation masks using only box exemplars and dot annotations. We show that the segmentation model trained on these pseudo-labeled masks can effectively localize objects of interest for an arbitrary multi-class image based on the exemplars. To evaluate the performance of different methods on multi-class counting, we introduce two new benchmarks, a synthetic multi-class dataset and a new test set of real images in which objects from multiple classes are present. Our proposed method shows a significant advantage over the previous CAC methods on these two benchmarks.

Via

Access Paper or Ask Questions

SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology

Jul 12, 2023

Jingwei Zhang, Ke Ma, Saarthak Kapse, Joel Saltz, Maria Vakalopoulou, Prateek Prasanna, Dimitris Samaras

Figure 1 for SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology

Figure 2 for SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology

Figure 3 for SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology

Figure 4 for SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology

Abstract:Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due to the following factors: (1) lack of comprehensive pathology datasets used in SAM training and (2) the design of SAM is not inherently optimized for semantic segmentation tasks. In this work, we adapt SAM for semantic segmentation by introducing trainable class prompts, followed by further enhancements through the incorporation of a pathology encoder, specifically a pathology foundation model. Our framework, SAM-Path enhances SAM's ability to conduct semantic segmentation in digital pathology without human input prompts. Through experiments on two public pathology datasets, the BCSS and the CRAG datasets, we demonstrate that the fine-tuning with trainable class prompts outperforms vanilla SAM with manual prompts and post-processing by 27.52% in Dice score and 71.63% in IOU. On these two datasets, the proposed additional pathology foundation model further achieves a relative improvement of 5.07% to 5.12% in Dice score and 4.50% to 8.48% in IOU.

* Submitted to MedAGI 2023

Via

Access Paper or Ask Questions

Conditional Generation from Unconditional Diffusion Models using Denoiser Representations

Jun 02, 2023

Alexandros Graikos, Srikar Yellapragada, Dimitris Samaras

Figure 1 for Conditional Generation from Unconditional Diffusion Models using Denoiser Representations

Figure 2 for Conditional Generation from Unconditional Diffusion Models using Denoiser Representations

Figure 3 for Conditional Generation from Unconditional Diffusion Models using Denoiser Representations

Figure 4 for Conditional Generation from Unconditional Diffusion Models using Denoiser Representations

Abstract:Denoising diffusion models have gained popularity as a generative modeling technique for producing high-quality and diverse images. Applying these models to downstream tasks requires conditioning, which can take the form of text, class labels, or other forms of guidance. However, providing conditioning information to these models can be challenging, particularly when annotations are scarce or imprecise. In this paper, we propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network. We demonstrate the effectiveness of our approach on various conditional generation tasks, including attribute-conditioned generation and mask-conditioned generation. Additionally, we show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%. Our approach provides a powerful and flexible way to adapt diffusion models to new conditions and generate high-quality augmented data for various conditional generation tasks.

Via

Access Paper or Ask Questions

AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

May 11, 2023

Aggelina Chatziagapi, Dimitris Samaras

Figure 1 for AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

Figure 2 for AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

Figure 3 for AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

Figure 4 for AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction

Abstract:In this work, we present a multimodal solution to the problem of 4D face reconstruction from monocular videos. 3D face reconstruction from 2D images is an under-constrained problem due to the ambiguity of depth. State-of-the-art methods try to solve this problem by leveraging visual information from a single image or video, whereas 3D mesh animation approaches rely more on audio. However, in most cases (e.g. AR/VR applications), videos include both visual and speech information. We propose AVFace that incorporates both modalities and accurately reconstructs the 4D facial and lip motion of any speaker, without requiring any 3D ground truth for training. A coarse stage estimates the per-frame parameters of a 3D morphable model, followed by a lip refinement, and then a fine stage recovers facial geometric details. Due to the temporal audio and video information captured by transformer-based modules, our method is robust in cases when either modality is insufficient (e.g. face occlusions). Extensive qualitative and quantitative evaluation demonstrates the superiority of our method over the current state-of-the-art.

* Accepted by CVPR 2023. Project page: https://aggelinacha.github.io/AVFace/

Via

Access Paper or Ask Questions

Computational Pathology: A Survey Review and The Way Forward

Apr 11, 2023

Mahdi S. Hosseini, Babak Ehteshami Bejnordi, Vincent Quoc-Huy Trinh, Danial Hasan, Xingwen Li, Taehyo Kim, Haochen Zhang, Theodore Wu, Kajanan Chinniah, Sina Maghsoudlou(+12 more)

Figure 1 for Computational Pathology: A Survey Review and The Way Forward

Figure 2 for Computational Pathology: A Survey Review and The Way Forward

Figure 3 for Computational Pathology: A Survey Review and The Way Forward

Figure 4 for Computational Pathology: A Survey Review and The Way Forward

Abstract:Computational Pathology (CoPath) is an interdisciplinary science that augments developments of computational approaches to analyze and model medical histopathology images. The main objective for CoPath is to develop infrastructure and workflows of digital diagnostics as an assistive CAD system for clinical pathology facilitating transformational changes in the diagnosis and treatment of cancer diseases. With evergrowing developments in deep learning and computer vision algorithms, and the ease of the data flow from digital pathology, currently CoPath is witnessing a paradigm shift. Despite the sheer volume of engineering and scientific works being introduced for cancer image analysis, there is still a considerable gap of adopting and integrating these algorithms in clinical practice. This raises a significant question regarding the direction and trends that are undertaken in CoPath. In this article we provide a comprehensive review of more than 700 papers to address the challenges faced in problem design all-the-way to the application and implementation viewpoints. We have catalogued each paper into a model-card by examining the key works and challenges faced to layout the current landscape in CoPath. We hope this helps the community to locate relevant works and facilitate understanding of the field's future directions. In a nutshell, we oversee the CoPath developments in cycle of stages which are required to be cohesively linked together to address the challenges associated with such multidisciplinary science. We overview this cycle from different perspectives of data-centric, model-centric, and application-centric problems. We finally sketch remaining challenges and provide directions for future technical developments and clinical integration of CoPath.

Via

Access Paper or Ask Questions

Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection

Apr 11, 2023

Jingyi Xu, Hieu Le, Dimitris Samaras

Figure 1 for Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection

Figure 2 for Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection

Abstract:Two-stage object detectors generate object proposals and classify them to detect objects in images. These proposals often do not contain the objects perfectly but overlap with them in many possible ways, exhibiting great variability in the difficulty levels of the proposals. Training a robust classifier against this crop-related variability requires abundant training data, which is not available in few-shot settings. To mitigate this issue, we propose a novel variational autoencoder (VAE) based data generation model, which is capable of generating data with increased crop-related diversity. The main idea is to transform the latent space such latent codes with different norms represent different crop-related variations. This allows us to generate features with increased crop-related diversity in difficulty levels by simply varying the latent norm. In particular, each latent code is rescaled such that its norm linearly correlates with the IoU score of the input crop w.r.t. the ground-truth box. Here the IoU score is a proxy that represents the difficulty level of the crop. We train this VAE model on base classes conditioned on the semantic code of each class and then use the trained model to generate features for novel classes. In our experiments our generated features consistently improve state-of-the-art few-shot object detection methods on the PASCAL VOC and MS COCO datasets.

* Accepted to CVPR 23

Via

Access Paper or Ask Questions

Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

Apr 05, 2023

Shahira Abousamra, Rajarsi Gupta, Tahsin Kurc, Dimitris Samaras, Joel Saltz, Chao Chen

Figure 1 for Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

Figure 2 for Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

Figure 3 for Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

Figure 4 for Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

Abstract:In digital pathology, the spatial context of cells is important for cell classification, cancer diagnosis and prognosis. To model such complex cell context, however, is challenging. Cells form different mixtures, lineages, clusters and holes. To model such structural patterns in a learnable fashion, we introduce several mathematical tools from spatial statistics and topological data analysis. We incorporate such structural descriptors into a deep generative model as both conditional inputs and a differentiable loss. This way, we are able to generate high quality multi-class cell layouts for the first time. We show that the topology-rich cell layouts can be used for data augmentation and improve the performance of downstream tasks such as cell classification.

* To be published in proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

Via

Access Paper or Ask Questions

Predicting Human Attention using Computational Attention

Apr 04, 2023

Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

Figure 1 for Predicting Human Attention using Computational Attention

Figure 2 for Predicting Human Attention using Computational Attention

Figure 3 for Predicting Human Attention using Computational Attention

Figure 4 for Predicting Human Attention using Computational Attention

Abstract:Most models of visual attention are aimed at predicting either top-down or bottom-up control, as studied using different visual search and free-viewing tasks. We propose Human Attention Transformer (HAT), a single model predicting both forms of attention control. HAT is the new state-of-the-art (SOTA) in predicting the scanpath of fixations made during target-present and target-absent search, and matches or exceeds SOTA in the prediction of taskless free-viewing fixation scanpaths. HAT achieves this new SOTA by using a novel transformer-based architecture and a simplified foveated retina that collectively create a spatio-temporal awareness akin to the dynamic visual working memory of humans. Unlike previous methods that rely on a coarse grid of fixation cells and experience information loss due to fixation discretization, HAT features a dense-prediction architecture and outputs a dense heatmap for each fixation, thus avoiding discretizing fixations. HAT sets a new standard in computational attention, which emphasizes both effectiveness and generality. HAT's demonstrated scope and applicability will likely inspire the development of new attention models that can better predict human behavior in various attention-demanding scenarios.

Via

Access Paper or Ask Questions

S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces

Mar 30, 2023

Haoyu Wu, Alexandros Graikos, Dimitris Samaras

Figure 1 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces

Figure 2 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces

Figure 3 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces

Figure 4 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces

Abstract:Neural rendering of implicit surfaces performs well in 3D vision applications. However, it requires dense input views as supervision. When only sparse input images are available, output quality drops significantly due to the shape-radiance ambiguity problem. We note that this ambiguity can be constrained when a 3D point is visible in multiple views, as is the case in multi-view stereo (MVS). We thus propose to regularize neural rendering optimization with an MVS solution. The use of an MVS probability volume and a generalized cross entropy loss leads to a noise-tolerant optimization process. In addition, neural rendering provides global consistency constraints that guide the MVS depth hypothesis sampling and thus improves MVS performance. Given only three sparse input views, experiments show that our method not only outperforms generic neural rendering models by a large margin but also significantly increases the reconstruction quality of MVS models. Project webpage: https://hao-yu-wu.github.io/s-volsdf/.

Via

Access Paper or Ask Questions

Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Mar 27, 2023

Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, Minh Hoai

Figure 1 for Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Figure 2 for Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Figure 3 for Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Figure 4 for Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Abstract:Predicting human gaze is important in Human-Computer Interaction (HCI). However, to practically serve HCI applications, gaze prediction models must be scalable, fast, and accurate in their spatial and temporal gaze predictions. Recent scanpath prediction models focus on goal-directed attention (search). Such models are limited in their application due to a common approach relying on trained target detectors for all possible objects, and the availability of human gaze data for their training (both not scalable). In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem. In contrast to existing methods using object detector modules, Gazeformer encodes the target using a natural language model, thus leveraging semantic similarities in scanpath prediction. We use a transformer-based encoder-decoder architecture because transformers are particularly useful for generating contextual representations. Gazeformer surpasses other models by a large margin on the ZeroGaze setting. It also outperforms existing target-detection models on standard gaze prediction for both target-present and target-absent search tasks. In addition to its improved performance, Gazeformer is more than five times faster than the state-of-the-art target-present visual search model.

* IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Via

Access Paper or Ask Questions