Alert button
Picture for Joseph Shtok

Joseph Shtok

Alert button

CHARTER: heatmap-based multi-type chart data extraction

Nov 28, 2021
Joseph Shtok, Sivan Harary, Ophir Azulai, Adi Raz Goldfarb, Assaf Arbelle, Leonid Karlinsky

Figure 1 for CHARTER: heatmap-based multi-type chart data extraction
Figure 2 for CHARTER: heatmap-based multi-type chart data extraction
Figure 3 for CHARTER: heatmap-based multi-type chart data extraction
Figure 4 for CHARTER: heatmap-based multi-type chart data extraction

The digital conversion of information stored in documents is a great source of knowledge. In contrast to the documents text, the conversion of the embedded documents graphics, such as charts and plots, has been much less explored. We present a method and a system for end-to-end conversion of document charts into machine readable tabular data format, which can be easily stored and analyzed in the digital domain. Our approach extracts and analyses charts along with their graphical elements and supporting structures such as legends, axes, titles, and captions. Our detection system is based on neural networks, trained solely on synthetic data, eliminating the limiting factor of data collection. As opposed to previous methods, which detect graphical elements using bounding-boxes, our networks feature auxiliary domain specific heatmaps prediction enabling the precise detection of pie charts, line and scatter plots which do not fit the rectangular bounding-box presumption. Qualitative and quantitative results show high robustness and precision, improving upon previous works on popular benchmarks

* Document Intelligence workshop at KDD 2021 conference  
* Joseph Shtok, Sivan Harary and Leonid Karlinsky had equal contribution 
Viaarxiv icon

Detector-Free Weakly Supervised Grounding by Separation

Apr 20, 2021
Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky

Figure 1 for Detector-Free Weakly Supervised Grounding by Separation
Figure 2 for Detector-Free Weakly Supervised Grounding by Separation
Figure 3 for Detector-Free Weakly Supervised Grounding by Separation
Figure 4 for Detector-Free Weakly Supervised Grounding by Separation

Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions' associations by random alpha-blending of arbitrary image pairs and using the corresponding texts of the pair as conditions to recover the alpha map from the blended image via a segmentation network. At test time, this allows using the query phrase as a condition for a non-blended query image, thus interpreting the test image as a composition of a region corresponding to the phrase and the complement region. Using this approach we demonstrate a significant accuracy improvement, of up to $8.5\%$ over previous DF-WSG SotA, for a range of benchmarks including Flickr30K, Visual Genome, and ReferIt, as well as a significant complementary improvement (above $7\%$) over the detector-based approaches for WSG.

Viaarxiv icon

StarNet: towards weakly supervised few-shot detection and explainable few-shot classification

Mar 15, 2020
Leonid Karlinsky, Joseph Shtok, Amit Alfassy, Moshe Lichtenstein, Sivan Harary, Eli Schwartz, Sivan Doveh, Prasanna Sattigeri, Rogerio Feris, Alexander Bronstein, Raja Giryes

Figure 1 for StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
Figure 2 for StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
Figure 3 for StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
Figure 4 for StarNet: towards weakly supervised few-shot detection and explainable few-shot classification

In this paper, we propose a new few-shot learning method called StarNet, which is an end-to-end trainable non-parametric star-model few-shot classifier. While being meta-trained using only image-level class labels, StarNet learns not only to predict the class labels for each query image of a few-shot task, but also to localize (via a heatmap) what it believes to be the key image regions supporting its prediction, thus effectively detecting the instances of the novel categories. The localization is enabled by the StarNet's ability to find large, arbitrarily shaped, semantically matching regions between all pairs of support and query images of a few-shot task. We evaluate StarNet on multiple few-shot classification benchmarks attaining significant state-of-the-art improvement on the CUB and ImageNetLOC-FS, and smaller improvements on other benchmarks. At the same time, in many cases, StarNet provides plausible explanations for its class label predictions, by highlighting the correctly paired novel category instances on the query and on its best matching support (for the predicted class). In addition, we test the proposed approach on the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), obtaining significant improvements over the baselines.

Viaarxiv icon

LaSO: Label-Set Operations networks for multi-label few-shot learning

Feb 26, 2019
Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogerio Feris, Raja Giryes, Alex M. Bronstein

Figure 1 for LaSO: Label-Set Operations networks for multi-label few-shot learning
Figure 2 for LaSO: Label-Set Operations networks for multi-label few-shot learning
Figure 3 for LaSO: Label-Set Operations networks for multi-label few-shot learning
Figure 4 for LaSO: Label-Set Operations networks for multi-label few-shot learning

Example synthesis is one of the leading methods to tackle the problem of few-shot learning, where only a small number of samples per class are available. However, current synthesis approaches only address the scenario of a single category label per image. In this work, we propose a novel technique for synthesizing samples with multiple labels for the (yet unhandled) multi-label few-shot classification scenario. We propose to combine pairs of given examples in feature space, so that the resulting synthesized feature vectors will correspond to examples whose label sets are obtained through certain set operations on the label sets of the corresponding input pairs. Thus, our method is capable of producing a sample containing the intersection, union or set-difference of labels present in two input samples. As we show, these set operations generalize to labels unseen during training. This enables performing augmentation on examples of novel categories, thus, facilitating multi-label few-shot classifier learning. We conduct numerous experiments showing promising results for the label-set manipulation capabilities of the proposed approach, both directly (using the classification and retrieval metrics), and in the context of performing data augmentation for multi-label few-shot learning. We propose a benchmark for this new and challenging task and show that our method compares favorably to all the common baselines.

Viaarxiv icon

RepMet: Representative-based metric learning for classification and one-shot object detection

Jun 15, 2018
Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Sharathchandra Pankanti, Rogerio Feris, Abhishek Kumar, Raja Giryes, Alex M. Bronstein

Figure 1 for RepMet: Representative-based metric learning for classification and one-shot object detection
Figure 2 for RepMet: Representative-based metric learning for classification and one-shot object detection
Figure 3 for RepMet: Representative-based metric learning for classification and one-shot object detection
Figure 4 for RepMet: Representative-based metric learning for classification and one-shot object detection

Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only few examples. In this work, we propose a new method for DML, featuring a joint learning of the embedding space and of the data distribution of the training categories, in a single training process. Our method improves upon leading algorithms for DML-based object classification. Furthermore, it opens the door for a new task in Computer Vision - a few-shot object detection, since the proposed DML architecture can be naturally embedded as the classification head of any standard object detector. In numerous experiments, we achieve state-of-the-art classification results on a variety of fine-grained datasets, and offer the community a benchmark on the few-shot detection task, performed on the Imagenet-LOC dataset. The code will be made available upon acceptance.

Viaarxiv icon

Delta-encoder: an effective sample synthesis method for few-shot object recognition

Jun 12, 2018
Eli Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes, Alex M. Bronstein

Figure 1 for Delta-encoder: an effective sample synthesis method for few-shot object recognition
Figure 2 for Delta-encoder: an effective sample synthesis method for few-shot object recognition
Figure 3 for Delta-encoder: an effective sample synthesis method for few-shot object recognition
Figure 4 for Delta-encoder: an effective sample synthesis method for few-shot object recognition

Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we proposes a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case. Upon acceptance code will be made available.

Viaarxiv icon

Spatially-Adaptive Reconstruction in Computed Tomography using Neural Networks

Nov 28, 2013
Joseph Shtok, Michael Zibulevsky, Michael Elad

Figure 1 for Spatially-Adaptive Reconstruction in Computed Tomography using Neural Networks
Figure 2 for Spatially-Adaptive Reconstruction in Computed Tomography using Neural Networks
Figure 3 for Spatially-Adaptive Reconstruction in Computed Tomography using Neural Networks
Figure 4 for Spatially-Adaptive Reconstruction in Computed Tomography using Neural Networks

We propose a supervised machine learning approach for boosting existing signal and image recovery methods and demonstrate its efficacy on example of image reconstruction in computed tomography. Our technique is based on a local nonlinear fusion of several image estimates, all obtained by applying a chosen reconstruction algorithm with different values of its control parameters. Usually such output images have different bias/variance trade-off. The fusion of the images is performed by feed-forward neural network trained on a set of known examples. Numerical experiments show an improvement in reconstruction quality relatively to existing direct and iterative reconstruction methods.

Viaarxiv icon

Spatially-Adaptive Reconstruction in Computed Tomography Based on Statistical Learning

Apr 25, 2010
Joseph Shtok, Michael Zibulevsky, Michael Elad

Figure 1 for Spatially-Adaptive Reconstruction in Computed Tomography Based on Statistical Learning
Figure 2 for Spatially-Adaptive Reconstruction in Computed Tomography Based on Statistical Learning
Figure 3 for Spatially-Adaptive Reconstruction in Computed Tomography Based on Statistical Learning
Figure 4 for Spatially-Adaptive Reconstruction in Computed Tomography Based on Statistical Learning

We propose a direct reconstruction algorithm for Computed Tomography, based on a local fusion of a few preliminary image estimates by means of a non-linear fusion rule. One such rule is based on a signal denoising technique which is spatially adaptive to the unknown local smoothness. Another, more powerful fusion rule, is based on a neural network trained off-line with a high-quality training set of images. Two types of linear reconstruction algorithms for the preliminary images are employed for two different reconstruction tasks. For an entire image reconstruction from full projection data, the proposed scheme uses a sequence of Filtered Back-Projection algorithms with a gradually growing cut-off frequency. To recover a Region Of Interest only from local projections, statistically-trained linear reconstruction algorithms are employed. Numerical experiments display the improvement in reconstruction quality when compared to linear reconstruction algorithms.

* Submitted to IEEE Transactions on Image Processing 
Viaarxiv icon