Alert button
Picture for Alexander C. Berg

Alexander C. Berg

Alert button

Segment Anything

Apr 05, 2023
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

Figure 1 for Segment Anything
Figure 2 for Segment Anything
Figure 3 for Segment Anything
Figure 4 for Segment Anything

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.

* Project web-page: https://segment-anything.com 
Viaarxiv icon

Point-Level Region Contrast for Object Detection Pre-Training

Feb 09, 2022
Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg

Figure 1 for Point-Level Region Contrast for Object Detection Pre-Training
Figure 2 for Point-Level Region Contrast for Object Detection Pre-Training
Figure 3 for Point-Level Region Contrast for Object Detection Pre-Training
Figure 4 for Point-Level Region Contrast for Object Detection Pre-Training

In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection. This approach is motivated by the two key factors in detection: localization and recognition. While accurate localization favors models that operate at the pixel- or point-level, correct recognition typically relies on a more holistic, region-level view of objects. Incorporating this perspective in pre-training, our approach performs contrastive learning by directly sampling individual point pairs from different regions. Compared to an aggregated representation per region, our approach is more robust to the change in input region quality, and further enables us to implicitly improve initial region assignments via online knowledge distillation during training. Both advantages are important when dealing with imperfect regions encountered in the unsupervised setting. Experiments show point-level region contrast improves on state-of-the-art pre-training methods for object detection and segmentation across multiple tasks and datasets, and we provide extensive ablation studies and visualizations to aid understanding. Code will be made available.

Viaarxiv icon

Neural Pseudo-Label Optimism for the Bank Loan Problem

Dec 03, 2021
Aldo Pacchiano, Shaun Singh, Edward Chou, Alexander C. Berg, Jakob Foerster

Figure 1 for Neural Pseudo-Label Optimism for the Bank Loan Problem
Figure 2 for Neural Pseudo-Label Optimism for the Bank Loan Problem

We study a class of classification problems best exemplified by the \emph{bank loan} problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions. As a result, it is possible for the lender's algorithm to ``get stuck'' with a self-fulfilling model. This model never corrects its false negatives, since it never sees the true label for rejected data, thus accumulating infinite regret. In the case of linear models, this issue can be addressed by adding optimism directly into the model predictions. However, there are few methods that extend to the function approximation case using Deep Neural Networks. We present Pseudo-Label Optimism (PLOT), a conceptually and computationally simple method for this setting applicable to DNNs. \PLOT{} adds an optimistic label to the subset of decision points the current model is deciding on, trains the model on all data so far (including these points along with their optimistic labels), and finally uses the resulting \emph{optimistic} model for decision making. \PLOT{} achieves competitive performance on a set of three challenging benchmark problems, requiring minimal hyperparameter tuning. We also show that \PLOT{} satisfies a logarithmic regret guarantee, under a Lipschitz and logistic mean label model, and under a separability condition on the data.

* 10 pages main, 14 pages appendix 
Viaarxiv icon

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Mar 30, 2021
Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov

Figure 1 for Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Figure 2 for Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Figure 3 for Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Figure 4 for Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

We present Boundary IoU (Intersection-over-Union), a new segmentation evaluation measure focused on boundary quality. We perform an extensive analysis across different error types and object sizes and show that Boundary IoU is significantly more sensitive than the standard Mask IoU measure to boundary errors for large objects and does not over-penalize errors on smaller objects. The new quality measure displays several desirable characteristics like symmetry w.r.t. prediction/ground truth pairs and balanced responsiveness across scales, which makes it more suitable for segmentation evaluation than other boundary-focused measures like Trimap IoU and F-measure. Based on Boundary IoU, we update the standard evaluation protocols for instance and panoptic segmentation tasks by proposing the Boundary AP (Average Precision) and Boundary PQ (Panoptic Quality) metrics, respectively. Our experiments show that the new evaluation metrics track boundary quality improvements that are generally overlooked by current Mask IoU-based evaluation metrics. We hope that the adoption of the new boundary-sensitive evaluation metrics will lead to rapid progress in segmentation methods that improve boundary quality.

* CVPR 2021, project page: https://bowenc0221.github.io/boundary-iou 
Viaarxiv icon

Similarity Search for Efficient Active Learning and Search of Rare Concepts

Jun 30, 2020
Cody Coleman, Edward Chou, Sean Culatana, Peter Bailis, Alexander C. Berg, Roshan Sumbaly, Matei Zaharia, I. Zeki Yalniz

Figure 1 for Similarity Search for Efficient Active Learning and Search of Rare Concepts
Figure 2 for Similarity Search for Efficient Active Learning and Search of Rare Concepts
Figure 3 for Similarity Search for Efficient Active Learning and Search of Rare Concepts
Figure 4 for Similarity Search for Efficient Active Learning and Search of Rare Concepts

Many active learning and search approaches are intractable for industrial settings with billions of unlabeled examples. Existing approaches, such as uncertainty sampling or information density, search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. However, in practice, data is often heavily skewed; only a small fraction of collected data will be relevant for a given learning task. For example, when identifying rare classes, detecting malicious content, or debugging model performance, the ratio of positive to negative examples can be 1 to 1,000 or more. In this work, we exploit this skew in large training datasets to reduce the number of unlabeled examples considered in each selection round by only looking at the nearest neighbors to the labeled examples. Empirically, we observe that learned representations effectively cluster unseen concepts, making active learning very effective and substantially reducing the number of viable unlabeled examples. We evaluate several active learning and search techniques in this setting on three large-scale datasets: ImageNet, Goodreads spoiler detection, and OpenImages. For rare classes, active learning methods need as little as 0.31% of the labeled data to match the average precision of full supervision. By limiting active learning methods to only consider the immediate neighbors of the labeled data as candidates for labeling, we need only process as little as 1% of the unlabeled data while achieving similar reductions in labeling costs as the traditional global approach. This process of expanding the candidate pool with the nearest neighbors of the labeled set can be done efficiently and reduces the computational complexity of selection by orders of magnitude.

Viaarxiv icon

A Mask-RCNN Baseline for Probabilistic Object Detection

Aug 09, 2019
Phil Ammirato, Alexander C. Berg

Figure 1 for A Mask-RCNN Baseline for Probabilistic Object Detection
Figure 2 for A Mask-RCNN Baseline for Probabilistic Object Detection
Figure 3 for A Mask-RCNN Baseline for Probabilistic Object Detection
Figure 4 for A Mask-RCNN Baseline for Probabilistic Object Detection

The Probabilistic Object Detection Challenge evaluates object detection methods using a new evaluation measure, Probability-based Detection Quality (PDQ), on a new synthetic image dataset. We present our submission to the challenge, a fine-tuned version of Mask-RCNN with some additional post-processing. Our method, submitted under username pammirato, is currently second on the leaderboard with a score of 21.432, while also achieving the highest spatial quality and average overall quality of detections. We hope this method can provide some insight into how detectors designed for mean average precision (mAP) evaluation behave under PDQ, as well as a strong baseline for future work.

* 2nd place in 1st PODC at CVPR 2019 
Viaarxiv icon

IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things

Jun 15, 2019
Cheng-Yang Fu, Tamara L. Berg, Alexander C. Berg

Figure 1 for IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things
Figure 2 for IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things
Figure 3 for IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things
Figure 4 for IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things

In this work, we present a new operator, called Instance Mask Projection (IMP), which projects a predicted Instance Segmentation as a new feature for semantic segmentation. It also supports back propagation so is trainable end-to-end. Our experiments show the effectiveness of IMP on both Clothing Parsing (with complex layering, large deformations, and non-convex objects), and on Street Scene Segmentation (with many overlapping instances and small objects). On the Varied Clothing Parsing dataset (VCP), we show instance mask projection can improve 3 points on mIOU from a state-of-the-art Panoptic FPN segmentation approach. On the ModaNet clothing parsing dataset, we show a dramatic improvement of 20.4% absolutely compared to existing baseline semantic segmentation results. In addition, the instance mask projection operator works well on other (non-clothing) datasets, providing an improvement of 3 points in mIOU on Thing classes of Cityscapes, a self-driving dataset, on top of a state-of-the-art approach.

Viaarxiv icon

Low-Power Computer Vision: Status, Challenges, Opportunities

Apr 15, 2019
Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong Gook Ko, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee, Zichao Li, Zhiyu Liang, Juzheng Liu, Xin Liu, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Hong Hanh Nguyen, Eunbyung Park, Denis Repin, Liang Shen, Tao Sheng, Fei Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, Jingchi Zhang, Xiaopeng Zhang, Shaojie Zhuo

Figure 1 for Low-Power Computer Vision: Status, Challenges, Opportunities
Figure 2 for Low-Power Computer Vision: Status, Challenges, Opportunities
Figure 3 for Low-Power Computer Vision: Status, Challenges, Opportunities
Figure 4 for Low-Power Computer Vision: Status, Challenges, Opportunities

Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners' solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision.

* Preprint, Accepted by IEEE Journal on Emerging and Selected Topics in Circuits and Systems. arXiv admin note: substantial text overlap with arXiv:1810.01732 
Viaarxiv icon

Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution

Mar 12, 2019
Chen Feng, Tao Sheng, Zhiyu Liang, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Matthew Ardi, Alexander C. Berg, Yiran Chen, Bo Chen, Kent Gauen, Yung-Hsiang Lu

Figure 1 for Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution
Figure 2 for Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution
Figure 3 for Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution
Figure 4 for Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution

The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015 that encourages joint hardware and software solutions for computer vision systems with low latency and power. Track 1 of the competition in 2018 focused on the innovation of software solutions with fixed inference engine and hardware. This decision allows participants to submit models online and not worry about building and bringing custom hardware on-site, which attracted a historically large number of submissions. Among the diverse solutions, the winning solution proposed a quantization-friendly framework for MobileNets that achieves an accuracy of 72.67% on the holdout dataset with an average latency of 27ms on a single CPU core of Google Pixel2 phone, which is superior to the best real-time MobileNet models at the time.

* Accepted At The 2nd Workshop on Machine Learning on the Phone and other Consumer Devices (MLPCD 2) 
Viaarxiv icon

RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

Jan 10, 2019
Cheng-Yang Fu, Mykhailo Shvets, Alexander C. Berg

Figure 1 for RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
Figure 2 for RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
Figure 3 for RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
Figure 4 for RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

Recently two-stage detectors have surged ahead of single-shot detectors in the accuracy-vs-speed trade-off. Nevertheless single-shot detectors are immensely popular in embedded vision applications. This paper brings single-shot detectors up to the same level as current two-stage techniques. We do this by improving training for the state-of-the-art single-shot detector, RetinaNet, in three ways: integrating instance mask prediction for the first time, making the loss function adaptive and more stable, and including additional hard examples in training. We call the resulting augmented network RetinaMask. The detection component of RetinaMask has the same computational cost as the original RetinaNet, but is more accurate. COCO test-dev results are up to 41.4 mAP for RetinaMask-101 vs 39.1mAP for RetinaNet-101, while the runtime is the same during evaluation. Adding Group Normalization increases the performance of RetinaMask-101 to 41.7 mAP. Code is at:https://github.com/chengyangfu/retinamask

Viaarxiv icon