Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Zheng

Alice Benchmarks: Connecting Real World Object Re-Identification with the Synthetic

Oct 06, 2023

Xiaoxiao Sun, Yue Yao, Shengjin Wang, Hongdong Li, Liang Zheng

Abstract:For object re-identification (re-ID), learning from synthetic data has become a promising strategy to cheaply acquire large-scale annotated datasets and effective models, with few privacy concerns. Many interesting research problems arise from this strategy, e.g., how to reduce the domain gap between synthetic source and real-world target. To facilitate developing more new approaches in learning from synthetic data, we introduce the Alice benchmarks, large-scale datasets providing benchmarks as well as evaluation protocols to the research community. Within the Alice benchmarks, two object re-ID tasks are offered: person and vehicle re-ID. We collected and annotated two challenging real-world target datasets: AlicePerson and AliceVehicle, captured under various illuminations, image resolutions, etc. As an important feature of our real target, the clusterability of its training set is not manually guaranteed to make it closer to a real domain adaptation test scenario. Correspondingly, we reuse existing PersonX and VehicleX as synthetic source domains. The primary goal is to train models from synthetic data that can work effectively in the real world. In this paper, we detail the settings of Alice benchmarks, provide an analysis of existing commonly-used domain adaptation methods, and discuss some interesting future directions. An online server will be set up for the community to evaluate methods conveniently and fairly.

* 9 pages, 4 figures, 4 tables

Via

Access Paper or Ask Questions

Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?

Sep 22, 2023

Xiaoxiao Sun, Nidham Gazagnadou, Vivek Sharma, Lingjuan Lyu, Hongdong Li, Liang Zheng

Figure 1 for Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?

Figure 2 for Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?

Figure 3 for Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?

Figure 4 for Privacy Assessment on Reconstructed Images: Are Existing Evaluation Metrics Faithful to Human Perception?

Abstract:Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Images determined as overall dissimilar, on the other hand, indicate higher robustness against attack. However, there is no guarantee that these metrics well reflect human opinions, which, as a judgement for model privacy leakage, are more trustworthy. In this paper, we comprehensively study the faithfulness of these hand-crafted metrics to human perception of privacy information from the reconstructed images. On 5 datasets ranging from natural images, faces, to fine-grained classes, we use 4 existing attack methods to reconstruct images from many different classification models and, for each reconstructed image, we ask multiple human annotators to assess whether this image is recognizable. Our studies reveal that the hand-crafted metrics only have a weak correlation with the human evaluation of privacy leakage and that even these metrics themselves often contradict each other. These observations suggest risks of current metrics in the community. To address this potential risk, we propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images. SemSim is trained with a standard triplet loss, using an original image as an anchor, one of its recognizable reconstructed images as a positive sample, and an unrecognizable one as a negative. By training on human annotations, SemSim exhibits a greater reflection of privacy leakage on the semantic level. We show that SemSim has a significantly higher correlation with human judgment compared with existing metrics. Moreover, this strong correlation generalizes to unseen datasets, models and attack methods.

* 15 pages, 9 figures and 3 tables

Via

Access Paper or Ask Questions

Training with Product Digital Twins for AutoRetail Checkout

Aug 18, 2023

Yue Yao, Xinyu Tian, Zheng Tang, Sujit Biswas, Huan Lei, Tom Gedeon, Liang Zheng

Figure 1 for Training with Product Digital Twins for AutoRetail Checkout

Figure 2 for Training with Product Digital Twins for AutoRetail Checkout

Figure 3 for Training with Product Digital Twins for AutoRetail Checkout

Figure 4 for Training with Product Digital Twins for AutoRetail Checkout

Abstract:Automating the checkout process is important in smart retail, where users effortlessly pass products by hand through a camera, triggering automatic product detection, tracking, and counting. In this emerging area, due to the lack of annotated training data, we introduce a dataset comprised of product 3D models, which allows for fast, flexible, and large-scale training data generation through graphic engine rendering. Within this context, we discern an intriguing facet, because of the user "hands-on" approach, bias in user behavior leads to distinct patterns in the real checkout process. The existence of such patterns would compromise training effectiveness if training data fail to reflect the same. To address this user bias problem, we propose a training data optimization framework, i.e., training with digital twins (DtTrain). Specifically, we leverage the product 3D models and optimize their rendering viewpoint and illumination to generate "digital twins" that visually resemble representative user images. These digital twins, inherit product labels and, when augmented, form the Digital Twin training set (DT set). Because the digital twins individually mimic user bias, the resulting DT training set better reflects the characteristics of the target scenario and allows us to train more effective product detection and tracking models. In our experiment, we show that DT set outperforms training sets created by existing dataset synthesis methods in terms of counting accuracy. Moreover, by combining DT set with pseudo-labeled real checkout data, further improvement is observed. The code is available at https://github.com/yorkeyao/Automated-Retail-Checkout.

Via

Access Paper or Ask Questions

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

Jul 10, 2023

Jaskirat Singh, Liang Zheng

Abstract:The field of text-conditioned image generation has made unparalleled progress with the recent advent of latent diffusion models. While remarkable, as the complexity of given text input increases, the state-of-the-art diffusion models may still fail in generating images which accurately convey the semantics of the given prompt. Furthermore, it has been observed that such misalignments are often left undetected by pretrained multi-modal models such as CLIP. To address these problems, in this paper we explore a simple yet effective decompositional approach towards both evaluation and improvement of text-to-image alignment. In particular, we first introduce a Decompositional-Alignment-Score which given a complex prompt decomposes it into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis reveals that the proposed alignment metric shows significantly higher correlation with human ratings as opposed to traditional CLIP, BLIP scores. Furthermore, we also find that the assertion level alignment scores provide a useful feedback which can then be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. Project page for our paper is available at https://1jsingh.github.io/divide-evaluate-and-refine

Via

Access Paper or Ask Questions

The 7th AI City Challenge

Apr 15, 2023

Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma(+10 more)

Abstract:The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential. The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across 46 countries. Track 1 was a brand new track that focused on multi-target multi-camera (MTMC) people tracking, where teams trained and evaluated using both real and highly realistic synthetic data. Track 2 centered around natural-language-based vehicle track retrieval. Track 3 required teams to classify driver actions in naturalistic driving analysis. Track 4 aimed to develop an automated checkout system for retail stores using a single view camera. Track 5, another new addition, tasked teams with detecting violations of the helmet rule for motorcyclists. Two leader boards were released for submissions based on different methods: a public leader board for the contest where external private data wasn't allowed and a general leader board for all results submitted. The participating teams' top performances established strong baselines and even outperformed the state-of-the-art in the proposed challenge tracks.

* Summary of the 7th AI City Challenge Workshop in conjunction with CVPR 2023

Via

Access Paper or Ask Questions

Large-scale Training Data Search for Object Re-identification

Mar 28, 2023

Yue Yao, Huan Lei, Tom Gedeon, Liang Zheng

Figure 1 for Large-scale Training Data Search for Object Re-identification

Figure 2 for Large-scale Training Data Search for Object Re-identification

Figure 3 for Large-scale Training Data Search for Object Re-identification

Figure 4 for Large-scale Training Data Search for Object Re-identification

Abstract:We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. Specifically, the search stage identifies and merges clusters of source identities which exhibit similar distributions with the target domain. The second stage, subject to a budget, then selects identities and their images from the Stage I output, to control the size of the resulting training set for efficient training. The two steps provide us with training sets 80\% smaller than the source pool while achieving a similar or even higher re-ID accuracy. These training sets are also shown to be superior to a few existing search methods such as random sampling and greedy sampling under the same budget on training data size. If we release the budget, training sets resulting from the first stage alone allow even higher re-ID accuracy. We provide interesting discussions on the specificity of our method to the re-ID problem and particularly its role in bridging the re-ID domain gap. The code is available at https://github.com/yorkeyao/SnP.

* Accepted to CVPR2023

Via

Access Paper or Ask Questions

A Bag-of-Prototypes Representation for Dataset-Level Applications

Mar 23, 2023

Weijie Tu, Weijian Deng, Tom Gedeon, Liang Zheng

Figure 1 for A Bag-of-Prototypes Representation for Dataset-Level Applications

Figure 2 for A Bag-of-Prototypes Representation for Dataset-Level Applications

Figure 3 for A Bag-of-Prototypes Representation for Dataset-Level Applications

Figure 4 for A Bag-of-Prototypes Representation for Dataset-Level Applications

Abstract:This work investigates dataset vectorization for two dataset-level tasks: assessing training set suitability and test set difficulty. The former measures how suitable a training set is for a target domain, while the latter studies how challenging a test set is for a learned model. Central to the two tasks is measuring the underlying relationship between datasets. This needs a desirable dataset vectorization scheme, which should preserve as much discriminative dataset information as possible so that the distance between the resulting dataset vectors can reflect dataset-to-dataset similarity. To this end, we propose a bag-of-prototypes (BoP) dataset representation that extends the image-level bag consisting of patch descriptors to dataset-level bag consisting of semantic prototypes. Specifically, we develop a codebook consisting of K prototypes clustered from a reference dataset. Given a dataset to be encoded, we quantize each of its image features to a certain prototype in the codebook and obtain a K-dimensional histogram. Without assuming access to dataset labels, the BoP representation provides a rich characterization of the dataset semantic distribution. Furthermore, BoP representations cooperate well with Jensen-Shannon divergence for measuring dataset-to-dataset similarity. Although very simple, BoP consistently shows its advantage over existing representations on a series of benchmarks for two dataset-level tasks.

* CVPR 2023 camera-ready

Via

Access Paper or Ask Questions

Learning to Select Camera Views: Efficient Multiview Understanding at Few Glances

Mar 10, 2023

Yunzhong Hou, Stephen Gould, Liang Zheng

Abstract:Multiview camera setups have proven useful in many computer vision applications for reducing ambiguities, mitigating occlusions, and increasing field-of-view coverage. However, the high computational cost associated with multiple views poses a significant challenge for end devices with limited computational resources. To address this issue, we propose a view selection approach that analyzes the target object or scenario from given views and selects the next best view for processing. Our approach features a reinforcement learning based camera selection module, MVSelect, that not only selects views but also facilitates joint training with the task network. Experimental results on multiview classification and detection tasks show that our approach achieves promising performance while using only 2 or 3 out of N available views, significantly reducing computational costs. Furthermore, analysis on the selected views reveals that certain cameras can be shut off with minimal performance impact, shedding light on future camera layout optimization for multiview systems. Code is available at https://github.com/hou-yz/MVSelect.

Via

Access Paper or Ask Questions

Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift

Mar 09, 2023

Yuli Zou, Weijian Deng, Liang Zheng

Figure 1 for Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift

Figure 2 for Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift

Figure 3 for Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift

Figure 4 for Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift

Abstract:Model calibration usually requires optimizing some parameters (e.g., temperature) w.r.t an objective function (e.g., negative log-likelihood). In this paper, we report a plain, important but often neglected fact that the objective function is influenced by calibration set difficulty, i.e., the ratio of the number of incorrectly classified samples to that of correctly classified samples. If a test set has a drastically different difficulty level from the calibration set, the optimal calibration parameters of the two datasets would be different. In other words, a calibrator optimal on the calibration set would be suboptimal on the OOD test set and thus has degraded performance. With this knowledge, we propose a simple and effective method named adaptive calibrator ensemble (ACE) to calibrate OOD datasets whose difficulty is usually higher than the calibration set. Specifically, two calibration functions are trained, one for in-distribution data (low difficulty), and the other for severely OOD data (high difficulty). To achieve desirable calibration on a new OOD dataset, ACE uses an adaptive weighting method that strikes a balance between the two extreme functions. When plugged in, ACE generally improves the performance of a few state-of-the-art calibration schemes on a series of OOD benchmarks. Importantly, such improvement does not come at the cost of the in-distribution calibration accuracy.

* 16 pages, 9 figures

Via

Access Paper or Ask Questions

Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective

Feb 16, 2023

Yuhang Zhang, Weihong Deng, Liang Zheng

Abstract:Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD. However, in the real world, we do not always have such ground truths, and thus do not know which sample is correctly detected and cannot compute the metric like AUROC to evaluate the performance of different OOD detection methods. In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection, which aims to evaluate OOD detection methods in real-world changing environments without OOD labels. We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance. We further introduce a new benchmark Gbench, which has 200 real-world OOD datasets of various label spaces to train and evaluate our method. Through experiments, we find a strong quantitative correlation betwwen Gscore and the OOD detection performance. Extensive experiments demonstrate that our Gscore achieves state-of-the-art performance. Gscore also generalizes well with different IND/OOD datasets, OOD detection methods, backbones and dataset sizes. We further provide interesting analyses of the effects of backbones and IND/OOD datasets on OOD detection performance. The data and code will be available.

Via

Access Paper or Ask Questions