Abstract:Monocular depth estimation can play an important role in addressing the issue of deriving scene geometry from 2D images. It has been used in a variety of industries, including robots, self-driving cars, scene comprehension, 3D reconstructions, and others. The goal of our method is to create a lightweight machine-learning model in order to predict the depth value of each pixel given only a single RGB image as input with the Unet structure of the image segmentation network. We use the NYU Depth V2 dataset to test the structure and compare the result with other methods. The proposed method achieves relatively high accuracy and low rootmean-square error.
Abstract:For humans, object detection, recognition, and tracking are innate. These provide the ability for human to perceive their environment and objects within their environment. This ability however doesn't translate well in computers. In Computer Vision and Multimedia, it is becoming increasingly more important to detect, recognize and track objects in images and/or videos. Many of these applications, such as facial recognition, surveillance, animation, are used for tracking features and/or people. However, these tasks prove challenging for computers to do effectively, as there is a significant amount of data to parse through. Therefore, many techniques and algorithms are needed and therefore researched to try to achieve human like perception. In this literature review, we focus on some novel techniques on object detection and recognition, and how to apply tracking algorithms to the detected features to track the objects' movements.