Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Learning the Compositional Visual Coherence for Complementary Recommendations

Jun 08, 2020
Zhi Li, Bo Wu, Qi Liu, Likang Wu, Hongke Zhao, Tao Mei

Complementary recommendations, which aim at providing users product suggestions that are supplementary and compatible with their obtained items, have become a hot topic in both academia and industry in recent years. %However, it is challenging due to its complexity and subjectivity. Existing work mainly focused on modeling the co-purchased relations between two items, but the compositional associations of item collections are largely unexplored. Actually, when a user chooses the complementary items for the purchased products, it is intuitive that she will consider the visual semantic coherence (such as color collocations, texture compatibilities) in addition to global impressions. Towards this end, in this paper, we propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents. Specifically, we first propose a \textit{Global Coherence Learning} (GCL) module based on multi-heads attention to model the global compositional coherence. Then, we generate the semantic-focal representations from different semantic regions and design a \textit{Focal Coherence Learning} (FCL) module to learn the focal compositional coherence from different semantic-focal representations. Finally, we optimize the CANN in a novel compositional optimization strategy. Extensive experiments on the large-scale real-world data clearly demonstrate the effectiveness of CANN compared with several state-of-the-art methods.

* Early version accepted by IJCAI2020 

  Access Paper or Ask Questions

A Comparison of Data Augmentation Techniques in Training Deep Neural Networks for Satellite Image Classification

Mar 30, 2020
Mohamed Abdelhack

Satellite imagery allows a plethora of applications ranging from weather forecasting to land surveying. The rapid development of computer vision systems could open new horizons to the utilization of satellite data due to the abundance of large volumes of data. However, current state-of-the-art computer vision systems mainly cater to applications that mainly involve natural images. While useful, those images exhibit a different distribution from satellite images in addition to having more spectral channels. This allows the use of pretrained deep learning models only in a subset of spectral channels that are equivalent to natural images thus discarding valuable information from other spectral channels. This calls for research effort to optimize deep learning models for satellite imagery to enable the assessment of their utility in the domain of remote sensing. This study focuses on the topic of image augmentation in training of deep neural network classifiers. I tested different techniques for image augmentation to train a standard deep neural network on satellite images from EuroSAT. Results show that while some image augmentation techniques commonly used in natural image training can readily be transferred to satellite images, some others could actually lead to a decrease in performance. Additionally, some novel image augmentation techniques that take into account the nature of satellite images could be useful to incorporate in training.

  Access Paper or Ask Questions

CNNTOP: a CNN-based Trajectory Owner Prediction Method

Jan 05, 2020
Xucheng Luo, Shengyang Li, Yuxiang Peng

Trajectory owner prediction is the basis for many applications such as personalized recommendation, urban planning. Although much effort has been put on this topic, the results archived are still not good enough. Existing methods mainly employ RNNs to model trajectories semantically due to the inherent sequential attribute of trajectories. However, these approaches are weak at Point of Interest (POI) representation learning and trajectory feature detection. Thus, the performance of existing solutions is far from the requirements of practical applications. In this paper, we propose a novel CNN-based Trajectory Owner Prediction (CNNTOP) method. Firstly, we connect all POI according to trajectories from all users. The result is a connected graph that can be used to generate more informative POI sequences than other approaches. Secondly, we employ the Node2Vec algorithm to encode each POI into a low-dimensional real value vector. Then, we transform each trajectory into a fixed-dimensional matrix, which is similar to an image. Finally, a CNN is designed to detect features and predict the owner of a given trajectory. The CNN can extract informative features from the matrix representations of trajectories by convolutional operations, Batch normalization, and $K$-max pooling operations. Extensive experiments on real datasets demonstrate that CNNTOP substantially outperforms existing solutions in terms of macro-Precision, macro-Recall, macro-F1, and accuracy.

* 9pages, 11figures 

  Access Paper or Ask Questions

Aesthetic Attributes Assessment of Images

Jul 29, 2019
Xin Jin, Le Wu, Geng Zhao, Xiaodong Li, Xiaokun Zhang, Shiming Ge, Dongqing Zou, Bin Zhou, Xinghui Zhou

Image aesthetic quality assessment has been a relatively hot topic during the last decade. Most recently, comments type assessment (aesthetic captions) has been proposed to describe the general aesthetic impression of an image using text. In this paper, we propose Aesthetic Attributes Assessment of Images, which means the aesthetic attributes captioning. This is a new formula of image aesthetic assessment, which predicts aesthetic attributes captions together with the aesthetic score of each attribute. We introduce a new dataset named \emph{DPC-Captions} which contains comments of up to 5 aesthetic attributes of one image through knowledge transfer from a full-annotated small-scale dataset. Then, we propose Aesthetic Multi-Attribute Network (AMAN), which is trained on a mixture of fully-annotated small-scale PCCD dataset and weakly-annotated large-scale DPC-Captions dataset. Our AMAN makes full use of transfer learning and attention model in a single framework. The experimental results on our DPC-Captions and PCCD dataset reveal that our method can predict captions of 5 aesthetic attributes together with numerical score assessment of each attribute. We use the evaluation criteria used in image captions to prove that our specially designed AMAN model outperforms traditional CNN-LSTM model and modern SCA-CNN model of image captions.

* to appear in ACM MM 2019, camera ready version 

  Access Paper or Ask Questions

Proceedings of the Workshop on Social Robots in Therapy: Focusing on Autonomy and Ethical Challenges

Dec 18, 2018
Pablo G. Esteban, Daniel Hernández García, Hee Rin Lee, Pauline Chevalier, Paul Baxter, Cindy L. Bethel, Jainendra Shukla, Joan Oliver, Domènec Puig, Jason R. Wilson, Linda Tickle-Degnen, Madeleine Bartlett, Tony Belpaeme, Serge Thill, Kim Baraka, Francisco S. Melo, Manuela Veloso, David Becerra, Maja Matarić, Eduard Fosch-Villaronga, Jordi Albo-Canals, Gloria Beraldo, Emanuele Menegatti, Valentina De Tommasi, Roberto Mancin, Franca Benini, Zachary Henkel, Kenna Baugus, David C. May, Lucile Dupuy, Wendy A. Rogers, Ronit Feingold Polak, Shelly Levy-Tzedek, Dagoberto Cruz-Sandoval, Jesus Favela, Michelle J. Johnson, Mayumi Mohan, Rochelle Mendonca

Robot-Assisted Therapy (RAT) has successfully been used in HRI research by including social robots in health-care interventions by virtue of their ability to engage human users both social and emotional dimensions. Research projects on this topic exist all over the globe in the USA, Europe, and Asia. All of these projects have the overall ambitious goal to increase the well-being of a vulnerable population. Typical work in RAT is performed using remote controlled robots; a technique called Wizard-of-Oz (WoZ). The robot is usually controlled, unbeknownst to the patient, by a human operator. However, WoZ has been demonstrated to not be a sustainable technique in the long-term. Providing the robots with autonomy (while remaining under the supervision of the therapist) has the potential to lighten the therapists burden, not only in the therapeutic session itself but also in longer-term diagnostic tasks. Therefore, there is a need for exploring several degrees of autonomy in social robots used in therapy. Increasing the autonomy of robots might also bring about a new set of challenges. In particular, there will be a need to answer new ethical questions regarding the use of robots with a vulnerable population, as well as a need to ensure ethically-compliant robot behaviours. Therefore, in this workshop we want to gather findings and explore which degree of autonomy might help to improve health-care interventions and how we can overcome the ethical challenges inherent to it.

* 25 pages, editors for the proceedings: Pablo G. Esteban, Daniel Hern\'andez Garc\'ia, Hee Rin Lee, Pauline Chevalier, Paul Baxter, Cindy Bethel 

  Access Paper or Ask Questions

Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification

Oct 16, 2018
Ahmed Elnaggar, Christoph Gebendorfer, Ingo Glaser, Florian Matthes

The digitalization of the legal domain has been ongoing for a couple of years. In that process, the application of different machine learning (ML) techniques is crucial. Tasks such as the classification of legal documents or contract clauses as well as the translation of those are highly relevant. On the other side, digitized documents are barely accessible in this field, particularly in Germany. Today, deep learning (DL) is one of the hot topics with many publications and various applications. Sometimes it provides results outperforming the human level. Hence this technique may be feasible for the legal domain as well. However, DL requires thousands of samples to provide decent results. A potential solution to this problem is multi-task DL to enable transfer learning. This approach may be able to overcome the data scarcity problem in the legal domain, specifically for the German language. We applied the state of the art multi-task model on three tasks: translation, summarization, and multi-label classification. The experiments were conducted on legal document corpora utilizing several task combinations as well as various model parameters. The goal was to find the optimal configuration for the tasks at hand within the legal domain. The multi-task DL approach outperformed the state of the art results in all three tasks. This opens a new direction to integrate DL technology more efficiently in the legal domain.

* 10 pages, 4 figures 

  Access Paper or Ask Questions

Video-to-Video Synthesis

Aug 20, 2018
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

* Code, models, and more results are available at 

  Access Paper or Ask Questions

Closing the AI Knowledge Gap

Mar 20, 2018
Ziv Epstein, Blakeley H. Payne, Judy Hanwen Shen, Abhimanyu Dubey, Bjarke Felbo, Matthew Groh, Nick Obradovich, Manuel Cebrian, Iyad Rahwan

AI researchers employ not only the scientific method, but also methodology from mathematics and engineering. However, the use of the scientific method - specifically hypothesis testing - in AI is typically conducted in service of engineering objectives. Growing interest in topics such as fairness and algorithmic bias show that engineering-focused questions only comprise a subset of the important questions about AI systems. This results in the AI Knowledge Gap: the number of unique AI systems grows faster than the number of studies that characterize these systems' behavior. To close this gap, we argue that the study of AI could benefit from the greater inclusion of researchers who are well positioned to formulate and test hypotheses about the behavior of AI systems. We examine the barriers preventing social and behavioral scientists from conducting such studies. Our diagnosis suggests that accelerating the scientific study of AI systems requires new incentives for academia and industry, mediated by new tools and institutions. To address these needs, we propose a two-sided marketplace called TuringBox. On one side, AI contributors upload existing and novel algorithms to be studied scientifically by others. On the other side, AI examiners develop and post machine intelligence tasks designed to evaluate and characterize algorithmic behavior. We discuss this market's potential to democratize the scientific study of AI behavior, and thus narrow the AI Knowledge Gap.

* 8 pages, 3 figures, under review 

  Access Paper or Ask Questions

Supersaliency: Predicting Smooth Pursuit-Based Attention with Slicing CNNs Improves Fixation Prediction for Naturalistic Videos

Jan 29, 2018
Mikhail Startsev, Michael Dorr

Predicting attention is a popular topic at the intersection of human and computer vision, but video saliency prediction has only recently begun to benefit from deep learning-based approaches. Even though most of the available video-based saliency data sets and models claim to target human observers' fixations, they fail to differentiate them from smooth pursuit (SP), a major eye movement type that is unique to perception of dynamic scenes. In this work, we aim to make this distinction explicit, to which end we (i) use both algorithmic and manual annotations of SP traces and other eye movements for two well-established video saliency data sets, (ii) train Slicing Convolutional Neural Networks (S-CNN) for saliency prediction on either fixation- or SP-salient locations, and (iii) evaluate ours and over 20 popular published saliency models on the two annotated data sets for predicting both SP and fixations, as well as on another data set of human fixations. Our proposed model, trained on an independent set of videos, outperforms the state-of-the-art saliency models in the task of SP prediction on all considered data sets. Moreover, this model also demonstrates superior performance in the prediction of "classical" fixation-based saliency. Our results emphasize the importance of selectively approaching training set construction for attention modelling.

  Access Paper or Ask Questions