Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

Sep 25, 2022
Yifan Wang, Zhanxuan Mei, Ioannis Katsavounidis, C. -C. Jay Kuo

Figure 1 for Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

Figure 2 for Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

Figure 3 for Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

Figure 4 for Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

A multi-grid multi-block-size vector quantization (MGBVQ) method is proposed for image coding in this work. The fundamental idea of image coding is to remove correlations among pixels before quantization and entropy coding, e.g., the discrete cosine transform (DCT) and intra predictions, adopted by modern image coding standards. We present a new method to remove pixel correlations. First, by decomposing correlations into long- and short-range correlations, we represent long-range correlations in coarser grids due to their smoothness, thus leading to a multi-grid (MG) coding architecture. Second, we show that short-range correlations can be effectively coded by a suite of vector quantizers (VQs). Along this line, we argue the effectiveness of VQs of very large block sizes and present a convenient way to implement them. It is shown by experimental results that MGBVQ offers excellent rate-distortion (RD) performance, which is comparable with existing image coders, at much lower complexity. Besides, it provides a progressive coded bitstream.

* GIC-python-v2

Via

Access Paper or Ask Questions

Structure and Content-Guided Video Synthesis with Diffusion Models

Feb 06, 2023
Patrick Esser, Johnathan Chiu, Parmida Atighehchian, Jonathan Granskog, Anastasis Germanidis

Figure 1 for Structure and Content-Guided Video Synthesis with Diffusion Models

Figure 2 for Structure and Content-Guided Video Synthesis with Diffusion Models

Figure 3 for Structure and Content-Guided Video Synthesis with Diffusion Models

Figure 4 for Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require expensive re-training for every input or rely on error-prone propagation of image edits across frames. In this work, we present a structure and content-guided video diffusion model that edits videos based on visual or textual descriptions of the desired output. Conflicts between user-provided content edits and structure representations occur due to insufficient disentanglement between the two aspects. As a solution, we show that training on monocular depth estimates with varying levels of detail provides control over structure and content fidelity. Our model is trained jointly on images and videos which also exposes explicit control of temporal consistency through a novel guidance method. Our experiments demonstrate a wide variety of successes; fine-grained control over output characteristics, customization based on a few reference images, and a strong user preference towards results by our model.

* Project page at https://research.runwayml.com/gen1

Via

Access Paper or Ask Questions

TherapyView: Visualizing Therapy Sessions with Temporal Topic Modeling and AI-Generated Arts

Feb 21, 2023
Baihan Lin, Stefan Zecevic, Djallel Bouneffouf, Guillermo Cecchi

Figure 1 for TherapyView: Visualizing Therapy Sessions with Temporal Topic Modeling and AI-Generated Arts

Figure 2 for TherapyView: Visualizing Therapy Sessions with Temporal Topic Modeling and AI-Generated Arts

Figure 3 for TherapyView: Visualizing Therapy Sessions with Temporal Topic Modeling and AI-Generated Arts

Figure 4 for TherapyView: Visualizing Therapy Sessions with Temporal Topic Modeling and AI-Generated Arts

We present the TherapyView, a demonstration system to help therapists visualize the dynamic contents of past treatment sessions, enabled by the state-of-the-art neural topic modeling techniques to analyze the topical tendencies of various psychiatric conditions and deep learning-based image generation engine to provide a visual summary. The system incorporates temporal modeling to provide a time-series representation of topic similarities at a turn-level resolution and AI-generated artworks given the dialogue segments to provide a concise representations of the contents covered in the session, offering interpretable insights for therapists to optimize their strategies and enhance the effectiveness of psychotherapy. This system provides a proof of concept of AI-augmented therapy tools with e in-depth understanding of the patient's mental state and enabling more effective treatment.

* This work extends our prior empirical work on topic modeling (arxiv:2204.10189) to now provide an interpretable and interactive data visualization platform with AI-generated artworks as a concrete user scenario for therapists

Via

Access Paper or Ask Questions

Speech Privacy Leakage from Shared Gradients in Distributed Learning

Feb 21, 2023
Zhuohang Li, Jiaxin Zhang, Jian Liu

Figure 1 for Speech Privacy Leakage from Shared Gradients in Distributed Learning

Figure 2 for Speech Privacy Leakage from Shared Gradients in Distributed Learning

Figure 3 for Speech Privacy Leakage from Shared Gradients in Distributed Learning

Figure 4 for Speech Privacy Leakage from Shared Gradients in Distributed Learning

Distributed machine learning paradigms, such as federated learning, have been recently adopted in many privacy-critical applications for speech analysis. However, such frameworks are vulnerable to privacy leakage attacks from shared gradients. Despite extensive efforts in the image domain, the exploration of speech privacy leakage from gradients is quite limited. In this paper, we explore methods for recovering private speech/speaker information from the shared gradients in distributed learning settings. We conduct experiments on a keyword spotting model with two different types of speech features to quantify the amount of leaked information by measuring the similarity between the original and recovered speech signals. We further demonstrate the feasibility of inferring various levels of side-channel information, including speech content and speaker identity, under the distributed learning framework without accessing the user's data.

Via

Access Paper or Ask Questions

Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

Feb 08, 2023
Muhammad Hassan, Hao Zhang, Ahmed Fateh Ameen, Home Wu Zeng, Shuye Ma, Wen Liang, Dingqi Shang, Jiaming Ding, Ziheng Zhan, Tsz Kwan Lam, Ming Xu, Qiming Huang, Dongmei Wu, Can Yang Zhang, Zhou You, Awiwu Ain, Pei Wu Qin

Figure 1 for Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

Figure 2 for Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

Figure 3 for Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

Figure 4 for Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

Fundus image captures rear of an eye, and which has been studied for the diseases identification, classification, segmentation, generation, and biological traits association using handcrafted, conventional, and deep learning methods. In biological traits estimation, most of the studies have been carried out for the age prediction and gender classification with convincing results. However, the current study utilizes the cutting-edge deep learning (DL) algorithms to estimate biological traits in terms of age and gender together with associating traits to retinal visuals. For the traits association, our study embeds aging as the label information into the proposed DL model to learn knowledge about the effected regions with aging. Our proposed DL models, named FAG-Net and FGC-Net, correspondingly estimate biological traits (age and gender) and generates fundus images. FAG-Net can generate multiple variants of an input fundus image given a list of ages as conditions. Our study analyzes fundus images and their corresponding association with biological traits, and predicts of possible spreading of ocular disease on fundus images given age as condition to the generative model. Our proposed models outperform the randomly selected state of-the-art DL models.

* 10 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

Oct 08, 2022
Cong Ma, Yaping Zhang, Mei Tu, Xu Han, Linghui Wu, Yang Zhao, Yu Zhou

Figure 1 for Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

Figure 2 for Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

Figure 3 for Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

Figure 4 for Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

End-to-end text image translation (TIT), which aims at translating the source language embedded in images to the target language, has attracted intensive attention in recent research. However, data sparsity limits the performance of end-to-end text image translation. Multi-task learning is a non-trivial way to alleviate this problem via exploring knowledge from complementary related tasks. In this paper, we propose a novel text translation enhanced text image translation, which trains the end-to-end model with text translation as an auxiliary task. By sharing model parameters and multi-task training, our model is able to take full advantage of easily-available large-scale text parallel corpus. Extensive experimental results show our proposed method outperforms existing end-to-end methods, and the joint multi-task learning with both text translation and recognition tasks achieves better results, proving translation and recognition auxiliary tasks are complementary.

* Accepted at the 26TH International Conference on Pattern Recognition (ICPR 2022)

Via

Access Paper or Ask Questions

A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Jan 23, 2023
Jianwen Xie, Yaxuan Zhu, Yifei Xu, Dingcheng Li, Ping Li

Figure 1 for A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Figure 2 for A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Figure 3 for A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Figure 4 for A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from the intractable posterior distribution is performed to infer the latent variables for each observed example, so that the parameters of the normalizing flow prior and the generator can be updated with the inferred latent variables. We show that, under the scenario of non-convergent short-run MCMC, the finite step Langevin dynamics is a flow-like approximate inference model and the learning objective actually follows the perturbation of the maximum likelihood estimation (MLE). We further point out that the learning framework seeks to (i) match the latent space normalizing flow and the aggregated posterior produced by the short-run Langevin flow, and (ii) bias the model from MLE such that the short-run Langevin flow inference is close to the true posterior. Empirical results of extensive experiments validate the effectiveness of the proposed latent space normalizing flow model in the tasks of image generation, image reconstruction, anomaly detection, supervised image inpainting and unsupervised image recovery.

* The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI) 2023

Via

Access Paper or Ask Questions

PalGAN: Image Colorization with Palette Generative Adversarial Networks

Oct 20, 2022
Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao

Figure 1 for PalGAN: Image Colorization with Palette Generative Adversarial Networks

Figure 2 for PalGAN: Image Colorization with Palette Generative Adversarial Networks

Figure 3 for PalGAN: Image Colorization with Palette Generative Adversarial Networks

Figure 4 for PalGAN: Image Colorization with Palette Generative Adversarial Networks

Multimodal ambiguity and color bleeding remain challenging in colorization. To tackle these problems, we propose a new GAN-based colorization approach PalGAN, integrated with palette estimation and chromatic attention. To circumvent the multimodality issue, we present a new colorization formulation that estimates a probabilistic palette from the input gray image first, then conducts color assignment conditioned on the palette through a generative model. Further, we handle color bleeding with chromatic attention. It studies color affinities by considering both semantic and intensity correlation. In extensive experiments, PalGAN outperforms state-of-the-arts in quantitative evaluation and visual comparison, delivering notable diverse, contrastive, and edge-preserving appearances. With the palette design, our method enables color transfer between images even with irrelevant contexts.

* Accepted at ECCV 2022

Via

Access Paper or Ask Questions

Image Classification using Sequence of Pixels

Sep 23, 2022
Gajraj Kuldeep

Figure 1 for Image Classification using Sequence of Pixels

Figure 2 for Image Classification using Sequence of Pixels

Figure 3 for Image Classification using Sequence of Pixels

Figure 4 for Image Classification using Sequence of Pixels

This study compares sequential image classification methods based on recurrent neural networks. We describe methods based on recurrent neural networks such as Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc. We also review the state-of-the-art sequential image classification architectures. We mainly focus on LSTM, BiLSTM, temporal convolution network, and independent recurrent neural network architecture in the study. It is known that RNN lacks in learning long-term dependencies in the input sequence. We use a simple feature construction method using orthogonal Ramanujan periodic transform on the input sequence. Experiments demonstrate that if these features are given to LSTM or BiLSTM networks, the performance increases drastically. Our focus in this study is to increase the training accuracy simultaneously reducing the training time for the LSTM and BiLSTM architecture, but not on pushing the state-of-the-art results, so we use simple LSTM/BiLSTM architecture. We compare sequential input with the constructed feature as input to single layer LSTM and BiLSTM network for MNIST and CIFAR datasets. We observe that sequential input to the LSTM network with 128 hidden unit training for five epochs results in training accuracy of 33% whereas constructed features as input to the same LSTM network results in training accuracy of 90% with 1/3 lesser time.

Via

Access Paper or Ask Questions

Invariant Target Detection in Images through the Normalized 2-D Correlation Technique

Feb 22, 2023
Fatin E. M. Al-Obaidi, Anwar H. Al-Saleh, Shaymaa H. Kafi, Ali J. Karam, Ali A. D. Al-Zuky

Figure 1 for Invariant Target Detection in Images through the Normalized 2-D Correlation Technique

Figure 2 for Invariant Target Detection in Images through the Normalized 2-D Correlation Technique

Figure 3 for Invariant Target Detection in Images through the Normalized 2-D Correlation Technique

Figure 4 for Invariant Target Detection in Images through the Normalized 2-D Correlation Technique

The normalized 2-D correlation technique is a robust method for detecting targets in images due to its ability to remain invariant under rotation, translation, and scaling. This paper examines the impact of translation, and scaling on target identification in images. The results indicate a high level of accuracy in detecting targets, even when they are exhibit variations in location and size. The results indicate that the similarity between the image and the two used targets improves as the resize ratio increases. All statistical estimators demonstrate a strong similarity between the original and extracted targets. The elapsed time for all scenarios falls within the range (44.75-44.85), (37.48-37.73) seconds for bird and children targets respectively, and the correlation coefficient displays stable relationships with values that fall within the range of (0.90-0.98) and (0.87-0.93) for bird and children targets respectively.

Via

Access Paper or Ask Questions