Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lior Wolf

Learning Personal Representations from fMRIby Predicting Neurofeedback Performance

Dec 06, 2021
Jhonathan Osin, Lior Wolf, Guy Gurevitch, Jackob Nimrod Keynan, Tom Fruchtman-Steinbok, Ayelet Or-Borichev, Shira Reznik Balter, Talma Hendler

Figure 1 for Learning Personal Representations from fMRIby Predicting Neurofeedback Performance

Figure 2 for Learning Personal Representations from fMRIby Predicting Neurofeedback Performance

Figure 3 for Learning Personal Representations from fMRIby Predicting Neurofeedback Performance

Figure 4 for Learning Personal Representations from fMRIby Predicting Neurofeedback Performance

We present a deep neural network method for learning a personal representation for individuals that are performing a self neuromodulation task, guided by functional MRI (fMRI). This neurofeedback task (watch vs. regulate) provides the subjects with a continuous feedback contingent on down regulation of their Amygdala signal and the learning algorithm focuses on this region's time-course of activity. The representation is learned by a self-supervised recurrent neural network, that predicts the Amygdala activity in the next fMRI frame given recent fMRI frames and is conditioned on the learned individual representation. It is shown that the individuals' representation improves the next-frame prediction considerably. Moreover, this personal representation, learned solely from fMRI images, yields good performance in linear prediction of psychiatric traits, which is better than performing such a prediction based on clinical data and personality tests. Our code is attached as supplementary and the data would be shared subject to ethical approvals.

* MICCAI 2020, https://link.springer.com/chapter/10.1007/978-3-030-59728-3_46

Via

Access Paper or Ask Questions

Learning Query Expansion over the Nearest Neighbor Graph

Dec 05, 2021
Benjamin Klein, Lior Wolf

Figure 1 for Learning Query Expansion over the Nearest Neighbor Graph

Figure 2 for Learning Query Expansion over the Nearest Neighbor Graph

Figure 3 for Learning Query Expansion over the Nearest Neighbor Graph

Figure 4 for Learning Query Expansion over the Nearest Neighbor Graph

Query Expansion (QE) is a well established method for improving retrieval metrics in image search applications. When using QE, the search is conducted on a new query vector, constructed using an aggregation function over the query and images from the database. Recent works gave rise to QE techniques in which the aggregation function is learned, whereas previous techniques were based on hand-crafted aggregation functions, e.g., taking the mean of the query's nearest neighbors. However, most QE methods have focused on aggregation functions that work directly over the query and its immediate nearest neighbors. In this work, a hierarchical model, Graph Query Expansion (GQE), is presented, which is learned in a supervised manner and performs aggregation over an extended neighborhood of the query, thus increasing the information used from the database when computing the query expansion, and using the structure of the nearest neighbors graph. The technique achieves state-of-the-art results over known benchmarks.

* BMVC 2021

Via

Access Paper or Ask Questions

End-to-End Segmentation via Patch-wise Polygons Prediction

Dec 05, 2021
Tal Shaharabany, Lior Wolf

Figure 1 for End-to-End Segmentation via Patch-wise Polygons Prediction

Figure 2 for End-to-End Segmentation via Patch-wise Polygons Prediction

Figure 3 for End-to-End Segmentation via Patch-wise Polygons Prediction

Figure 4 for End-to-End Segmentation via Patch-wise Polygons Prediction

The leading segmentation methods represent the output map as a pixel grid. We study an alternative representation in which the object edges are modeled, per image patch, as a polygon with $k$ vertices that is coupled with per-patch label probabilities. The vertices are optimized by employing a differentiable neural renderer to create a raster image. The delineated region is then compared with the ground truth segmentation. Our method obtains multiple state-of-the-art results: 76.26\% mIoU on the Cityscapes validation, 90.92\% IoU on the Vaihingen building segmentation benchmark, 66.82\% IoU for the MoNU microscopy dataset, and 90.91\% for the bird benchmark CUB. Our code for training and reproducing these results is attached as supplementary.

Via

Access Paper or Ask Questions

SegDiff: Image Segmentation with Diffusion Probabilistic Models

Dec 01, 2021
Tomer Amit, Eliya Nachmani, Tal Shaharbany, Lior Wolf

Figure 1 for SegDiff: Image Segmentation with Diffusion Probabilistic Models

Figure 2 for SegDiff: Image Segmentation with Diffusion Probabilistic Models

Figure 3 for SegDiff: Image Segmentation with Diffusion Probabilistic Models

Figure 4 for SegDiff: Image Segmentation with Diffusion Probabilistic Models

Diffusion Probabilistic Methods are employed for state-of-the-art image generation. In this work, we present a method for extending such models for performing image segmentation. The method learns end-to-end, without relying on a pre-trained backbone. The information in the input image and in the current estimation of the segmentation map is merged by summing the output of two encoders. Additional encoding layers and a decoder are then used to iteratively refine the segmentation map using a diffusion model. Since the diffusion model is probabilistic, it is applied multiple times and the results are merged into a final segmentation map. The new method obtains state-of-the-art results on the Cityscapes validation set, the Vaihingen building segmentation benchmark, and the MoNuSeg dataset.

Via

Access Paper or Ask Questions

Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Nov 29, 2021
Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf

Figure 1 for Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Figure 2 for Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Figure 3 for Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Figure 4 for Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences. While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image. In this work, we repurpose such models to generate a descriptive text given an image at inference time, without any further training or tuning step. This is done by combining the visual-semantic model with a large language model, benefiting from the knowledge in both web-scale models. The resulting captions are much less restrictive than those obtained by supervised captioning methods. Moreover, as a zero-shot learning method, it is extremely flexible and we demonstrate its ability to perform image arithmetic in which the inputs can be either images or text and the output is a sentence. This enables novel high-level vision capabilities such as comparing two images or solving visual analogy tests.

Via

Access Paper or Ask Questions

Learning a Weight Map for Weakly-Supervised Localization

Nov 28, 2021
Tal Shaharabany, Lior Wolf

Figure 1 for Learning a Weight Map for Weakly-Supervised Localization

Figure 2 for Learning a Weight Map for Weakly-Supervised Localization

Figure 3 for Learning a Weight Map for Weakly-Supervised Localization

Figure 4 for Learning a Weight Map for Weakly-Supervised Localization

In the weakly supervised localization setting, supervision is given as an image-level label. We propose to employ an image classifier $f$ and to train a generative network $g$ that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image. Network $g$ is trained by minimizing the discrepancy between the output of the classifier $f$ on the original image and its output given the same image weighted by the output of $g$. The scheme requires a regularization term that ensures that $g$ does not provide a uniform weight, and an early stopping criterion in order to prevent $g$ from over-segmenting the image. Our results indicate that the method outperforms existing localization methods by a sizable margin on the challenging fine-grained classification datasets, as well as a generic image recognition dataset. Additionally, the obtained weight map is also state-of-the-art in weakly supervised segmentation in fine-grained categorization datasets.

Via

Access Paper or Ask Questions

A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

Nov 25, 2021
Or Goren, Eliya Nachmani, Lior Wolf

Figure 1 for A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

Figure 2 for A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

Figure 3 for A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

Figure 4 for A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

We present a method for the generation of Midi files of piano music. The method models the right and left hands using two networks, where the left hand is conditioned on the right hand. This way, the melody is generated before the harmony. The Midi is represented in a way that is invariant to the musical scale, and the melody is represented, for the purpose of conditioning the harmony, by the content of each bar, viewed as a chord. Finally, notes are added randomly, based on this chord representation, in order to enrich the generated audio. Our experiments show a significant improvement over the state of the art for training on such datasets, and demonstrate the contribution of each of the novel components.

* Accepted for publication at MMM 2022

Via

Access Paper or Ask Questions

Geometric Transformer for End-to-End Molecule Properties Prediction

Oct 26, 2021
Yoni Choukroun, Lior Wolf

Figure 1 for Geometric Transformer for End-to-End Molecule Properties Prediction

Figure 2 for Geometric Transformer for End-to-End Molecule Properties Prediction

Figure 3 for Geometric Transformer for End-to-End Molecule Properties Prediction

Figure 4 for Geometric Transformer for End-to-End Molecule Properties Prediction

Transformers have become methods of choice in many applications thanks to their ability to represent complex interaction between elements. However, extending the Transformer architecture to non-sequential data such as molecules and enabling its training on small datasets remain a challenge. In this work, we introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule. We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism. We further suggest an augmentation scheme for molecular data capable of avoiding the overfitting induced by the overparameterized architecture. The proposed framework outperforms the state-of-the-art methods while being based on pure machine learning solely, i.e. the method does not incorporate domain knowledge from quantum chemistry and does not use extended geometric inputs beside the pairwise atomic distances.

Via

Access Paper or Ask Questions

Image-Based CLIP-Guided Essence Transfer

Oct 26, 2021
Hila Chefer, Sagie Benaim, Roni Paiss, Lior Wolf

Figure 1 for Image-Based CLIP-Guided Essence Transfer

Figure 2 for Image-Based CLIP-Guided Essence Transfer

Figure 3 for Image-Based CLIP-Guided Essence Transfer

The conceptual blending of two signals is a semantic task that may underline both creativity and intelligence. We propose to perform such blending in a way that incorporates two latent spaces: that of the generator network and that of the semantic network. For the first network, we employ the powerful StyleGAN generator, and for the second, the powerful image-language matching network of CLIP. The new method creates a blending operator that is optimized to be simultaneously additive in both latent spaces. Our results demonstrate that this leads to blending that is much more natural than what can be obtained in each space separately.

Via

Access Paper or Ask Questions

Video and Text Matching with Conditioned Embeddings

Oct 21, 2021
Ameen Ali, Idan Schwartz, Tamir Hazan, Lior Wolf

Figure 1 for Video and Text Matching with Conditioned Embeddings

Figure 2 for Video and Text Matching with Conditioned Embeddings

Figure 3 for Video and Text Matching with Conditioned Embeddings

Figure 4 for Video and Text Matching with Conditioned Embeddings

We present a method for matching a text sentence from a given corpus to a given video clip and vice versa. Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of the other. In this work, we encode the dataset data in a way that takes into account the query's relevant information. The power of the method is demonstrated to arise from pooling the interaction data between words and frames. Since the encoding of the video clip depends on the sentence compared to it, the representation needs to be recomputed for each potential match. To this end, we propose an efficient shallow neural network. Its training employs a hierarchical triplet loss that is extendable to paragraph/video matching. The method is simple, provides explainability, and achieves state-of-the-art results for both sentence-clip and video-text by a sizable margin across five different datasets: ActivityNet, DiDeMo, YouCook2, MSR-VTT, and LSMDC. We also show that our conditioned representation can be transferred to video-guided machine translation, where we improved the current results on VATEX. Source code is available at https://github.com/AmeenAli/VideoMatch.

* WACV 2022

Via

Access Paper or Ask Questions