Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Raja Giryes

School of Electrical Engineering, Tel Aviv University, Tel Aviv, Israel

3VL: using Trees to teach Vision & Language models compositional concepts

Dec 28, 2023

Nir Yellinek, Leonid Karlinsky, Raja Giryes

Abstract:Vision-Language models (VLMs) have proved effective at aligning image and text representations, producing superior zero-shot results when transferred to many downstream tasks. However, these representations suffer some key shortcomings in Compositional Language Concepts (CLC) understanding such as recognizing objects' attributes, states, and relations between different objects. Moreover, VLMs typically have poor interpretability, making it challenging to debug and mitigate compositional-understanding failures. In this work, we introduce the Tree-augmented Vision-Language (3VL) model architecture and training technique accompanied by our proposed Anchor inference method and Differential Relevance (DiRe) interpretability tool. By expanding the text of an arbitrary image-text pair into a hierarchical tree structure using language analysis tools, 3VL allows inducing this structure into the visual representation learned by the model, enhancing its interpretability and compositional reasoning. Additionally, we show how Anchor, a simple technique for text unification, can be employed to filter nuisance factors while increasing CLC understanding performance, e.g., on the fundamental VL-Checklist benchmark. We also exhibit how DiRe, which performs a differential comparison between VLM relevancy maps, enables us to generate compelling visualizations of the reasons for a model's success or failure.

Via

Access Paper or Ask Questions

A Self Supervised StyleGAN for Image Annotation and Classification with Extremely Limited Labels

Dec 26, 2023

Dana Cohen Hochberg, Hayit Greenspan, Raja Giryes

Figure 1 for A Self Supervised StyleGAN for Image Annotation and Classification with Extremely Limited Labels

Figure 2 for A Self Supervised StyleGAN for Image Annotation and Classification with Extremely Limited Labels

Figure 3 for A Self Supervised StyleGAN for Image Annotation and Classification with Extremely Limited Labels

Figure 4 for A Self Supervised StyleGAN for Image Annotation and Classification with Extremely Limited Labels

Abstract:The recent success of learning-based algorithms can be greatly attributed to the immense amount of annotated data used for training. Yet, many datasets lack annotations due to the high costs associated with labeling, resulting in degraded performances of deep learning methods. Self-supervised learning is frequently adopted to mitigate the reliance on massive labeled datasets since it exploits unlabeled data to learn relevant feature representations. In this work, we propose SS-StyleGAN, a self-supervised approach for image annotation and classification suitable for extremely small annotated datasets. This novel framework adds self-supervision to the StyleGAN architecture by integrating an encoder that learns the embedding to the StyleGAN latent space, which is well-known for its disentangled properties. The learned latent space enables the smart selection of representatives from the data to be labeled for improved classification performance. We show that the proposed method attains strong classification results using small labeled datasets of sizes 50 and even 10. We demonstrate the superiority of our approach for the tasks of COVID-19 and liver tumor pathology identification.

* IEEE Transactions on Medical Imaging, 41(12), Dec. 2022
* Accepted to IEEE Transactions on Medical Imaging

Via

Access Paper or Ask Questions

Tell Me What You See: Text-Guided Real-World Image Denoising

Dec 15, 2023

Erez Yosef, Raja Giryes

Abstract:Image reconstruction in low-light conditions is a challenging problem. Many solutions have been proposed for it, where the main approach is trying to learn a good prior of natural images along with modeling the true statistics of the noise in the scene. In the presence of very low lighting conditions, such approaches are usually not enough, and additional information is required, e.g., in the form of using multiple captures. In this work, we suggest as an alternative to add a description of the scene as prior, which can be easily done by the photographer who is capturing the scene. Using a text-conditioned diffusion model, we show that adding image caption information improves significantly the image reconstruction in low-light conditions on both synthetic and real-world images.

Via

Access Paper or Ask Questions

Deep Internal Learning: Deep Learning from a Single Input

Dec 12, 2023

Tom Tirer, Raja Giryes, Se Young Chun, Yonina C. Eldar

Abstract:Deep learning in general focuses on training a neural network from large labeled datasets. Yet, in many cases there is value in training a network just from the input at hand. This may involve training a network from scratch using a single input or adapting an already trained network to a provided input example at inference time. This survey paper aims at covering deep internal-learning techniques that have been proposed in the past few years for these two important directions. While our main focus will be on image processing problems, most of the approaches that we survey are derived for general signals (vectors with recurring patterns that can be distinguished from noise) and are therefore applicable to other modalities. We believe that the topic of internal-learning is very important in many signal and image processing problems where training data is scarce and diversity is large on the one hand, and on the other, there is a lot of structure in the data that can be exploited.

Via

Access Paper or Ask Questions

MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations

Dec 06, 2023

Assaf Ben-Kish, Moran Yanuka, Morris Alper, Raja Giryes, Hadar Averbuch-Elor

Figure 1 for MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations

Figure 2 for MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations

Figure 3 for MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations

Figure 4 for MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations

Abstract:While recent years have seen rapid progress in image-conditioned text generation, image captioning still suffers from the fundamental issue of hallucinations, the generation of spurious details that cannot be inferred from the given image. Dedicated methods for reducing hallucinations in image captioning largely focus on closed-vocabulary object tokens, ignoring most types of hallucinations that occur in practice. In this work, we propose MOCHa, an approach that harnesses advancements in reinforcement learning (RL) to address the sequence-level nature of hallucinations in an open-world setup. To optimize for caption fidelity to the input image, we leverage ground-truth reference captions as proxies to measure the logical consistency of generated captions. However, optimizing for caption fidelity alone fails to preserve the semantic adequacy of generations; therefore, we propose a multi-objective reward function that jointly targets these qualities, without requiring any strong supervision. We demonstrate that these goals can be simultaneously optimized with our framework, enhancing performance for various captioning models of different scales. Our qualitative and quantitative results demonstrate MOCHa's superior performance across various established metrics. We also demonstrate the benefit of our method in the open-vocabulary setting. To this end, we contribute OpenCHAIR, a new benchmark for quantifying open-vocabulary hallucinations in image captioning models, constructed using generative foundation models. We will release our code, benchmark, and trained models.

* Website Link: https://assafbk.github.io/mocha/

Via

Access Paper or Ask Questions

On The Relationship Between Universal Adversarial Attacks And Sparse Representations

Nov 14, 2023

Dana Weitzner, Raja Giryes

Abstract:The prominent success of neural networks, mainly in computer vision tasks, is increasingly shadowed by their sensitivity to small, barely perceivable adversarial perturbations in image input. In this work, we aim at explaining this vulnerability through the framework of sparsity. We show the connection between adversarial attacks and sparse representations, with a focus on explaining the universality and transferability of adversarial examples in neural networks. To this end, we show that sparse coding algorithms, and the neural network-based learned iterative shrinkage thresholding algorithm (LISTA) among them, suffer from this sensitivity, and that common attacks on neural networks can be expressed as attacks on the sparse representation of the input image. The phenomenon that we observe holds true also when the network is agnostic to the sparse representation and dictionary, and thus can provide a possible explanation for the universality and transferability of adversarial attacks. The code is available at https://github.com/danawr/adversarial_attacks_and_sparse_representations.

Via

Access Paper or Ask Questions

Group Orthogonalization Regularization For Vision Models Adaptation and Robustness

Jun 16, 2023

Yoav Kurtz, Noga Bar, Raja Giryes

Abstract:As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.

Via

Access Paper or Ask Questions

Comparing machine learning models for tau triggers

Jun 11, 2023

Maayan Yaary, Uriel Barron, Luis Pascual Domínguez, Boping Chen, Liron Barak, Erez Etzion, Raja Giryes

Figure 1 for Comparing machine learning models for tau triggers

Figure 2 for Comparing machine learning models for tau triggers

Figure 3 for Comparing machine learning models for tau triggers

Figure 4 for Comparing machine learning models for tau triggers

Abstract:This paper introduces novel supervised learning techniques for real-time selection (triggering) of hadronically decaying tau leptons in proton-proton colliders. By implementing classic machine learning decision trees and advanced deep learning models, such as Multi-Layer Perceptron or residual NN, visible improvements in performance compared to standard tau triggers are observed. We show how such an implementation may lower the current energy thresholds, thus contributing to increasing the sensitivity of searches for new phenomena in proton-proton collisions classified by low-energy tau leptons.

* Submitted to JHEP

Via

Access Paper or Ask Questions

An information-Theoretic Approach to Semi-supervised Transfer Learning

Jun 11, 2023

Daniel Jakubovitz, David Uliel, Miguel Rodrigues, Raja Giryes

Abstract:Transfer learning is a valuable tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest novel information-theoretic approaches for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by incorporating regularization terms on the target data based on information-theoretic quantities, namely the Mutual Information and the Lautum Information. We demonstrate the effectiveness of the proposed approaches in various semi-supervised transfer learning experiments.

* arXiv admin note: substantial text overlap with arXiv:1904.01670

Via

Access Paper or Ask Questions

AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder

Jun 10, 2023

Tal Shaharabany, Aviad Dahan, Raja Giryes, Lior Wolf

Figure 1 for AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder

Figure 2 for AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder

Figure 3 for AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder

Figure 4 for AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder

Abstract:The recently introduced Segment Anything Model (SAM) combines a clever architecture and large quantities of training data to obtain remarkable image segmentation capabilities. However, it fails to reproduce such results for Out-Of-Distribution (OOD) domains such as medical images. Moreover, while SAM is conditioned on either a mask or a set of points, it may be desirable to have a fully automatic solution. In this work, we replace SAM's conditioning with an encoder that operates on the same input image. By adding this encoder and without further fine-tuning SAM, we obtain state-of-the-art results on multiple medical images and video benchmarks. This new encoder is trained via gradients provided by a frozen SAM. For inspecting the knowledge within it, and providing a lightweight segmentation solution, we also learn to decode it into a mask by a shallow deconvolution network.

Via

Access Paper or Ask Questions