Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

James Baker

Using Style Ambiguity Loss to Improve Aesthetics of Diffusion Models

Oct 02, 2024

James Baker

Abstract:Teaching text-to-image models to be creative involves using style ambiguity loss. In this work, we explore using the style ambiguity training objective, used to approximate creativity, on a diffusion model. We then experiment with forms of style ambiguity loss that do not require training a classifier or a labeled dataset, and find that the models trained with style ambiguity loss can generate better images than the baseline diffusion models and GANs. Code is available at https://github.com/jamesBaker361/clipcreate.

* arXiv admin note: substantial text overlap with arXiv:2407.12009

Via

Access Paper or Ask Questions

BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion

Aug 08, 2024

James Baker

Abstract:Textual Inversion remains a popular method for personalizing diffusion models, in order to teach models new subjects and styles. We note that textual inversion has been underexplored using alternatives to the UNet, and experiment with textual inversion with a vision transformer. We also seek to optimize textual inversion using a strategy that does not require explicit use of the UNet and its idiosyncratic layers, so we add bonus tokens and enforce orthogonality. We find the use of the bonus token improves adherence to the source images and the use of the vision transformer improves adherence to the prompt. Code is available at https://github.com/jamesBaker361/tex_inv_plus.

Via

Access Paper or Ask Questions

In the Red: Social Media and Stock Prices

Nov 14, 2023

James Baker

Abstract:Spearheaded by retail traders on the website reddit, the GameStop short squeeze of early 2021 shows that social media embeds information that correlates with market movements. This paper seeks to examine this relationship by using daily frequencies of classified comments and buzzwords as additional factors in a Fama-French three factor model. Comments are classified using an unsupervised clustering method, while past studies have used pretrained models that are not specific to the domains being studied.

Via

Access Paper or Ask Questions

ARTEMIS: Using GANs with Multiple Discriminators to Generate Art

Nov 14, 2023

James Baker

Abstract:We propose a novel method for generating abstract art. First an autoencoder is trained to encode and decode the style representations of images, which are extracted from source images with a pretrained VGG network. Then, the decoder component of the autoencoder is extracted and used as a generator in a GAN. The generator works with an ensemble of discriminators. Each discriminator takes different style representations of the same images, and the generator is trained to create images that create convincing style representations in order to deceive all of the generators. The generator is also trained to maximize a diversity term. The resulting images had a surreal, geometric quality. We call our approach ARTEMIS (ARTistic Encoder- Multi- Discriminators Including Self-Attention), as it uses the self-attention layers and an encoder-decoder architecture.

Via

Access Paper or Ask Questions

The Heat is On: Thermal Facial Landmark Tracking

Nov 14, 2023

James Baker

Abstract:Facial landmark tracking for thermal images requires tracking certain important regions of subjects' faces, using images from thermal images, which omit lighting and shading, but show the temperatures of their subjects. The fluctuations of heat in particular places reflect physiological changes like bloodflow and perspiration, which can be used to remotely gauge things like anxiety and excitement. Past work in this domain has been limited to only a very limited set of architectures and techniques. This work goes further by trying a comprehensive suit of various models with different components, such as residual connections, channel and feature-wise attention, as well as the practice of ensembling components of the network to work in parallel. The best model integrated convolutional and residual layers followed by a channel-wise self-attention layer, requiring less than 100K parameters.

Via

Access Paper or Ask Questions