Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Leung

Detecting and Controlling Sycophancy with Cascading Linear Features

Jun 23, 2026

Maty Bohacek, Rishub Jain, Nicholas Dufour, Thomas Leung, Chris Bregler, Roma Patel

Abstract:Interpreting and controlling model behaviors through activation steering methods requires many pairs of contrastive samples that clearly exhibit desired or undesired behavior. These data pairs determine the degree to which interpretability frameworks can reliably detect model features responsible for a behavior, and therefore the ability to steer models toward or away from such behavior. In this work, we present an iterative data generation pipeline that isolates cascading linear features responsible for a behavior. Specifically, we show how moving beyond simple binary pairs of samples, and instead isolating samples that show degrees of features that scale linearly with behavior, allows for better disentanglement of features. We focus on detecting and steering away from sycophancy -- the tendency of language models to prioritize user validation. We demonstrate that sycophancy features discovered through cascading samples form linearly separable subspaces, and allow for selection of model activations that more clearly correspond to the desired behavior than baseline approaches. We also evaluate their ability to enable detection, deterministic scoring, and robust steering, and see that they either match or outperform LLM-as-a-judge and system prompting baselines while providing lower computational demand and more interpretability guarantees. Code & Data: https://cascading-feats.github.io/

Via

Access Paper or Ask Questions

FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion

Jun 12, 2024

George Cazenavette, Avneesh Sud, Thomas Leung, Ben Usman

Abstract:Due to the high potential for abuse of GenAI systems, the task of detecting synthetic images has recently become of great interest to the research community. Unfortunately, existing image-space detectors quickly become obsolete as new high-fidelity text-to-image models are developed at blinding speed. In this work, we propose a new synthetic image detector that uses features obtained by inverting an open-source pre-trained Stable Diffusion model. We show that these inversion features enable our detector to generalize well to unseen generators of high visual fidelity (e.g., DALL-E 3) even when the detector is trained only on lower fidelity fake images generated via Stable Diffusion. This detector achieves new state-of-the-art across multiple training and evaluation setups. Moreover, we introduce a new challenging evaluation protocol that uses reverse image search to mitigate stylistic and thematic biases in the detector evaluation. We show that the resulting evaluation scores align well with detectors' in-the-wild performance, and release these datasets as public benchmarks for future research.

* CVPR 2024
* Project page: https://fake-inversion.github.io

Via

Access Paper or Ask Questions

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Feb 25, 2023

Wan-Duo Kurt Ma, J. P. Lewis, W. Bastiaan Kleijn, Thomas Leung

Abstract:Text-guided diffusion models such as DALLE-2, IMAGEN, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are very high quality as well. However, these models often struggle to compose scenes containing several key objects such as characters in specified positional relationships. Unfortunately, this capability to ``direct'' the placement of characters and objects both within and across images is crucial in storytelling, as recognized in the literature on film and animation theory. In this work we take a particularly straightforward approach to providing the needed direction, by injecting ``activation'' at desired positions in the cross-attention maps corresponding to the objects under control, while attenuating the remainder of the map. The resulting approach is a step toward generalizing the applicability of text-guided diffusion models beyond single images to collections of related images, as in storybooks. To the best of our knowledge, our Directed Diffusion method is the first diffusion technique that provides positional control over multiple objects, while making use of an existing pre-trained model and maintaining a coherent blend between the positioned objects and the background. Moreover, it requires only a few lines to implement.

* Our project page: https://hohonu-vicml.github.io/DirectedDiffusion.Page

Via

Access Paper or Ask Questions

NewsStories: Illustrating articles with visual summaries

Aug 14, 2022

Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud, Thomas Leung

Figure 1 for NewsStories: Illustrating articles with visual summaries

Figure 2 for NewsStories: Illustrating articles with visual summaries

Figure 3 for NewsStories: Illustrating articles with visual summaries

Abstract:Recent self-supervised approaches have used large-scale image-text datasets to learn powerful representations that transfer to many tasks without finetuning. These methods often assume that there is one-to-one correspondence between its images and their (short) captions. However, many tasks require reasoning about multiple images and long text narratives, such as describing news articles with visual summaries. Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images. In addition, unlike prior work which assumed captions have a literal relation to the image, we assume images only contain loose illustrative correspondence with the text. To explore this problem, we introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos. We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images. Finally, we introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.

* Accepted at ECCV 2022

Via

Access Paper or Ask Questions

Geo-Aware Networks for Fine Grained Recognition

Jun 04, 2019

Grace Chu, Brian Potetz, Weijun Wang, Andrew Howard, Yang Song, Fernando Brucher, Thomas Leung, Hartwig Adam

Figure 1 for Geo-Aware Networks for Fine Grained Recognition

Figure 2 for Geo-Aware Networks for Fine Grained Recognition

Figure 3 for Geo-Aware Networks for Fine Grained Recognition

Figure 4 for Geo-Aware Networks for Fine Grained Recognition

Abstract:Fine grained recognition distinguishes among categories with subtle visual differences. To help identify fine grained categories, other information besides images has been used. However, there has been little effort on using geolocation information to improve fine grained classification accuracy. Our contributions to this field are twofold. First, to the best of our knowledge, this is the first paper which systematically examined various ways of incorporating geolocation information to fine grained images classification - from geolocation priors, to post-processing, to feature modulation. Secondly, to overcome the situation where no fine grained dataset has complete geolocation information, we introduce, and will make public, two fine grained datasets with geolocation by providing complementary information to existing popular datasets - iNaturalist and YFCC100M. Results on these datasets show that, the best geo-aware network can achieve 8.9% top-1 accuracy increase on iNaturalist and 5.9% increase on YFCC100M, compared with image only models' results. In addition, for small image baseline models like Mobilenet V2, the best geo-aware network gives 12.6% higher top-1 accuracy than image only model, achieving even higher performance than Inception V3 models without geolocation. Our work gives incentives to use geolocation information to improve fine grained recognition for both server and on-device models.

Via

Access Paper or Ask Questions

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Aug 13, 2018

Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei

Figure 1 for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Figure 2 for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Figure 3 for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Figure 4 for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels

Abstract:Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on WebVision, a large benchmark containing 2.2 million images of real-world noisy labels. The code are at https://github.com/google/mentornet

* published at ICML 2018

Via

Access Paper or Ask Questions

Towards a Semantic Perceptual Image Metric

Aug 01, 2018

Troy Chinen, Johannes Ballé, Chunhui Gu, Sung Jin Hwang, Sergey Ioffe, Nick Johnston, Thomas Leung, David Minnen, Sean O'Malley, Charles Rosenberg(+1 more)

Figure 1 for Towards a Semantic Perceptual Image Metric

Figure 2 for Towards a Semantic Perceptual Image Metric

Figure 3 for Towards a Semantic Perceptual Image Metric

Figure 4 for Towards a Semantic Perceptual Image Metric

Abstract:We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification. We fit the metric to a new database based on 140k unique images annotated with ground truth by human raters who received minimal instruction. The resulting metric shows competitive performance on TID 2013, a database widely used to assess image quality assessments methods. More interestingly, it shows strong responses to objects potentially carrying semantic relevance such as faces and text, which we demonstrate using a visualization technique and ablation experiments. In effect, the metric appears to model a higher influence of semantic context on judgments, which we observe particularly in untrained raters. As the vast majority of users of image processing systems are unfamiliar with Image Quality Assessment (IQA) tasks, these findings may have significant impact on real-world applications of perceptual metrics.

Via

Access Paper or Ask Questions

Improving the Robustness of Deep Neural Networks via Stability Training

Apr 15, 2016

Stephan Zheng, Yang Song, Thomas Leung, Ian Goodfellow

Figure 1 for Improving the Robustness of Deep Neural Networks via Stability Training

Figure 2 for Improving the Robustness of Deep Neural Networks via Stability Training

Figure 3 for Improving the Robustness of Deep Neural Networks via Stability Training

Figure 4 for Improving the Robustness of Deep Neural Networks via Stability Training

Abstract:In this paper we address the issue of output instability of deep neural networks: small perturbations in the visual input can significantly distort the feature embeddings and output of a neural network. Such instability affects many deep architectures with state-of-the-art performance on a wide range of computer vision tasks. We present a general stability training method to stabilize deep networks against small input distortions that result from various types of common image processing, such as compression, rescaling, and cropping. We validate our method by stabilizing the state-of-the-art Inception architecture against these types of distortions. In addition, we demonstrate that our stabilized model gives robust state-of-the-art performance on large-scale near-duplicate detection, similar-image ranking, and classification on noisy datasets.

* Published in CVPR 2016

Via

Access Paper or Ask Questions

Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Jul 01, 2015

Greg Mori, Caroline Pantofaru, Nisarg Kothari, Thomas Leung, George Toderici, Alexander Toshev, Weilong Yang

Figure 1 for Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Figure 2 for Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Figure 3 for Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Figure 4 for Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Abstract:We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation capable of making distinctions between different poses. Experiments on human pose matching and retrieval from video data demonstrate the potential of the method.

Via

Access Paper or Ask Questions

Learning Fine-grained Image Similarity with Deep Ranking

Apr 17, 2014

Jiang Wang, Yang song, Thomas Leung, Chuck Rosenberg, Jinbin Wang, James Philbin, Bo Chen, Ying Wu

Figure 1 for Learning Fine-grained Image Similarity with Deep Ranking

Figure 2 for Learning Fine-grained Image Similarity with Deep Ranking

Figure 3 for Learning Fine-grained Image Similarity with Deep Ranking

Abstract:Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class image differences. This paper proposes a deep ranking model that employs deep learning techniques to learn similarity metric directly from images.It has higher learning capability than models based on hand-crafted features. A novel multiscale network structure has been developed to describe the images effectively. An efficient triplet sampling algorithm is proposed to learn the model with distributed asynchronized stochastic gradient. Extensive experiments show that the proposed algorithm outperforms models based on hand-crafted visual features and deep classification models.

* CVPR 2014

Via

Access Paper or Ask Questions