In this paper we detail Cortexica's (https://www.cortexica.com) recommendation framework -- particularly, we describe how a hybrid visual recommender system can be created by combining conditional random fields for segmentation and deep neural networks for object localisation and feature representation. The recommendation system that is built after localisation, segmentation and classification has two properties -- first, it is knowledge based in the sense that it learns pairwise preference/occurrence matrix by utilising knowledge from experts (images from fashion blogs) and second, it is content-based as it utilises a deep learning based framework for learning feature representation. Such a construct is especially useful when there is a scarcity of user preference data, that forms the foundation of many collaborative recommendation algorithms.
With the advent of specialized hardware such as Graphics Processing Units (GPUs), large scale image localization, classification and retrieval have seen increased prevalence. Designing scalable software architecture that co-evolves with such specialized hardware is a challenge in the commercial setting. In this paper, we describe one such architecture (\textit{Cortexica}) that leverages scalability of GPUs and sandboxing offered by docker containers. This allows for the flexibility of mixing different computer architectures as well as computational algorithms with the security of a trusted environment. We illustrate the utility of this framework in a commercial setting i.e., searching for multiple products in an image by combining image localisation and retrieval.