INRIA - IRISA
Abstract:Recent years have seen a significant increase in video content creation and consumption. Crafting engaging content requires the careful curation of both visual and audio elements. While visual cue curation, through techniques like optimal viewpoint selection or post-editing, has been central to media production, its natural counterpart, audio, has not undergone equivalent advancements. This often results in a disconnect between visual and acoustic saliency. To bridge this gap, we introduce a novel task: visually-guided acoustic highlighting, which aims to transform audio to deliver appropriate highlighting effects guided by the accompanying video, ultimately creating a more harmonious audio-visual experience. We propose a flexible, transformer-based multimodal framework to solve this task. To train our model, we also introduce a new dataset -- the muddy mix dataset, leveraging the meticulous audio and video crafting found in movies, which provides a form of free supervision. We develop a pseudo-data generation process to simulate poorly mixed audio, mimicking real-world scenarios through a three-step process -- separation, adjustment, and remixing. Our approach consistently outperforms several baselines in both quantitative and subjective evaluation. We also systematically study the impact of different types of contextual guidance and difficulty levels of the dataset. Our project page is here: https://wikichao.github.io/VisAH/.
Abstract:We investigate the methods that simultaneously enforce sparsity and low-rank structure in a matrix as often employed for sparse phase retrieval problems or phase calibration problems in compressive sensing. We propose a new approach for analyzing the trade off between the sparsity and low rank constraints in these approaches which not only helps to provide guidelines to adjust the weights between the aforementioned constraints, but also enables new simulation strategies for evaluating performance. We then provide simulation results for phase retrieval and phase calibration cases both to demonstrate the consistency of the proposed method with other approaches and to evaluate the change of performance with different weights for the sparsity and low rank structure constraints.