Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michel Melo Silva

Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs

Feb 15, 2022

Daniel Louzada Fernandes, Marcos Henrique Fonseca Ribeiro, Fabio Ribeiro Cerqueira, Michel Melo Silva

Figure 1 for Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs

Figure 2 for Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs

Figure 3 for Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs

Figure 4 for Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs

Abstract:Several services for people with visual disabilities have emerged recently due to achievements in Assistive Technologies and Artificial Intelligence areas. Despite the growth in assistive systems availability, there is a lack of services that support specific tasks, such as understanding the image context presented in online content, e.g., webinars. Image captioning techniques and their variants are limited as Assistive Technologies as they do not match the needs of visually impaired people when generating specific descriptions. We propose an approach for generating context of webinar images combining a dense captioning technique with a set of filters, to fit the captions in our domain, and a language model for the abstractive summary task. The results demonstrated that we can produce descriptions with higher interpretability and focused on the relevant information for that group of people by combining image analysis methods and neural language models.

* Accepted in the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP) 2022

Via

Access Paper or Ask Questions

A Sparse Sampling-based framework for Semantic Fast-Forward of First-Person Videos

Sep 21, 2020

Michel Melo Silva, Washington Luis Souza Ramos, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Figure 1 for A Sparse Sampling-based framework for Semantic Fast-Forward of First-Person Videos

Figure 2 for A Sparse Sampling-based framework for Semantic Fast-Forward of First-Person Videos

Figure 3 for A Sparse Sampling-based framework for Semantic Fast-Forward of First-Person Videos

Figure 4 for A Sparse Sampling-based framework for Semantic Fast-Forward of First-Person Videos

Abstract:Technological advances in sensors have paved the way for digital cameras to become increasingly ubiquitous, which, in turn, led to the popularity of the self-recording culture. As a result, the amount of visual data on the Internet is moving in the opposite direction of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched stashed away in some computer folder or website. In this paper, we address the problem of creating smooth fast-forward videos without losing the relevant content. We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem. Using a smoothing frame transition and filling visual gaps between segments, our approach accelerates first-person videos emphasizing the relevant segments and avoids visual discontinuities. Experiments conducted on controlled videos and also on an unconstrained dataset of First-Person Videos (FPVs) show that, when creating fast-forward videos, our method is able to retain as much relevant information and smoothness as the state-of-the-art techniques, but in less processing time.

* Accepted at the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020. arXiv admin note: text overlap with arXiv:1802.08722

Via

Access Paper or Ask Questions

A gaze driven fast-forward method for first-person videos

Jun 10, 2020

Alan Carvalho Neves, Michel Melo Silva, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Figure 1 for A gaze driven fast-forward method for first-person videos

Figure 2 for A gaze driven fast-forward method for first-person videos

Figure 3 for A gaze driven fast-forward method for first-person videos

Abstract:The growing data sharing and life-logging cultures are driving an unprecedented increase in the amount of unedited First-Person Videos. In this paper, we address the problem of accessing relevant information in First-Person Videos by creating an accelerated version of the input video and emphasizing the important moments to the recorder. Our method is based on an attention model driven by gaze and visual scene analysis that provides a semantic score of each frame of the input video. We performed several experimental evaluations on publicly available First-Person Videos datasets. The results show that our methodology can fast-forward videos emphasizing moments when the recorder visually interact with scene components while not including monotonous clips.

* Accepted for presentation at EPIC@CVPR2020 workshop

Via

Access Paper or Ask Questions

A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Mar 13, 2018

Michel Melo Silva, Washington Luis Souza Ramos, Joao Klock Ferreira, Felipe Cadar Chamone, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Figure 1 for A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Figure 2 for A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Figure 3 for A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Figure 4 for A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Abstract:Thanks to the advances in the technology of low-cost digital cameras and the popularity of the self-recording culture, the amount of visual data on the Internet is going to the opposite side of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched in a computer folder or website. In this work, we address the problem of creating smooth fast-forward videos without losing the relevant content. We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem, which combined with a smoothing frame transition method accelerates first-person videos emphasizing the relevant segments and avoids visual discontinuities. The experiments show that our method is able to fast-forward videos to retain as much relevant information and smoothness as the state-of-the-art techniques in less time. We also present a new 80-hour multimodal (RGB-D, IMU, and GPS) dataset of first-person videos with annotations for recorder profile, frame scene, activities, interaction, and attention.

* Accepted for publication in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018. Link to the project wesite: https://www.verlab.dcc.ufmg.br/semantic-hyperlapse/

Via

Access Paper or Ask Questions

Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects

Mar 07, 2018

Michel Melo Silva, Washington Luis Souza Ramos, Felipe Cadar Chamone, João Pedro Klock Ferreira, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Figure 1 for Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects

Figure 2 for Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects

Figure 3 for Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects

Figure 4 for Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects

Abstract:The emergence of low-cost high-quality personal wearable cameras combined with the increasing storage capacity of video-sharing websites have evoked a growing interest in first-person videos, since most videos are composed of long-running unedited streams which are usually tedious and unpleasant to watch. State-of-the-art semantic fast-forward methods currently face the challenge of providing an adequate balance between smoothness in visual flow and the emphasis on the relevant parts. In this work, we present the Multi-Importance Fast-Forward (MIFF), a fully automatic methodology to fast-forward egocentric videos facing these challenges. The dilemma of defining what is the semantic information of a video is addressed by a learning process based on the preferences of the user. Results show that the proposed method keeps over $3$ times more semantic content than the state-of-the-art fast-forward. Finally, we discuss the need of a particular video stabilization technique for fast-forward egocentric videos.

* Accepted to publication in the Journal of Visual Communication and Image Representation (JVCI) 2018. Project website: https://www.verlab.dcc.ufmg.br/semantic-hyperlapse

Via

Access Paper or Ask Questions

Towards Semantic Fast-Forward and Stabilized Egocentric Videos

Aug 16, 2017

Michel Melo Silva, Washington Luis Souza Ramos, Joao Pedro Klock Ferreira, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Figure 1 for Towards Semantic Fast-Forward and Stabilized Egocentric Videos

Figure 2 for Towards Semantic Fast-Forward and Stabilized Egocentric Videos

Figure 3 for Towards Semantic Fast-Forward and Stabilized Egocentric Videos

Figure 4 for Towards Semantic Fast-Forward and Stabilized Egocentric Videos

Abstract:The emergence of low-cost personal mobiles devices and wearable cameras and the increasing storage capacity of video-sharing websites have pushed forward a growing interest towards first-person videos. Since most of the recorded videos compose long-running streams with unedited content, they are tedious and unpleasant to watch. The fast-forward state-of-the-art methods are facing challenges of balancing the smoothness of the video and the emphasis in the relevant frames given a speed-up rate. In this work, we present a methodology capable of summarizing and stabilizing egocentric videos by extracting the semantic information from the frames. This paper also describes a dataset collection with several semantically labeled videos and introduces a new smoothness evaluation metric for egocentric videos that is used to test our method.

* Accepted for publication and presented in the First International Workshop on Egocentric Perception, Interaction and Computing at European Conference on Computer Vision (EPIC@ECCV) 2016

Via

Access Paper or Ask Questions

Fast-Forward Video Based on Semantic Extraction

Aug 16, 2017

Washington Luis Souza Ramos, Michel Melo Silva, Mario Fernando Montenegro Campos, Erickson Rangel Nascimento

Figure 1 for Fast-Forward Video Based on Semantic Extraction

Figure 2 for Fast-Forward Video Based on Semantic Extraction

Figure 3 for Fast-Forward Video Based on Semantic Extraction

Figure 4 for Fast-Forward Video Based on Semantic Extraction

Abstract:Thanks to the low operational cost and large storage capacity of smartphones and wearable devices, people are recording many hours of daily activities, sport actions and home videos. These videos, also known as egocentric videos, are generally long-running streams with unedited content, which make them boring and visually unpalatable, bringing up the challenge to make egocentric videos more appealing. In this work we propose a novel methodology to compose the new fast-forward video by selecting frames based on semantic information extracted from images. The experiments show that our approach outperforms the state-of-the-art as far as semantic information is concerned and that it is also able to produce videos that are more pleasant to be watched.

* Accepted for publication and presented in 2016 IEEE International Conference on Image Processing (ICIP)

Via

Access Paper or Ask Questions