Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

Jan 05, 2022
Matti Pietikäinen, Olli Silven

Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.

* 234 pages. Published as an electronic publication at the University of Oulu, Finland, in December 2021, ISBN: 978-952-62-3199-0 link http://jultika.oulu.fi/Record/isbn978-952-62-3199-0 

  Access Paper or Ask Questions

Show and Write: Entity-aware News Generation with Image Information

Dec 11, 2021
Zhongping Zhang, Yiwen Gu, Bryan A. Plummer

Automatically writing long articles is a complex and challenging language generation task. Prior work has primarily focused on generating these articles using human-written prompt to provide some topical context and some metadata about the article. That said, for many applications, such as generating news stories, these articles are often paired with images and their captions or alt-text, which in turn are based on real-world events and may reference many different named entities that are difficult to be correctly recognized and predicted by language models. To address these two problems, this paper introduces an Entity-aware News Generation method with Image iNformation, Engin, to incorporate news image information into language models. Engin produces news articles conditioned on both metadata and information such as captions and named entities extracted from images. We also propose an Entity-aware mechanism to help our model better recognize and predict the entity names in news. We perform experiments on two public large-scale news datasets, GoodNews and VisualNews. Quantitative results show that our approach improves article perplexity by 4-5 points over the base models. Qualitative results demonstrate the text generated by Engin is more consistent with news images. We also perform article quality annotation experiment on the generated articles to validate that our model produces higher-quality articles. Finally, we investigate the effect Engin has on methods that automatically detect machine-generated articles.


  Access Paper or Ask Questions

Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning

Oct 25, 2021
Weiqiang Jin, Hang Yu, Xi Tao, Ruiping Yin

Knowledge Base Question Answering (KBQA) aims to answer userquestions from a knowledge base (KB) by identifying the reasoningrelations between topic entity and answer. As a complex branchtask of KBQA, multi-hop KGQA requires reasoning over multi-hop relational chains preserved in KG to arrive at the right answer.Despite the successes made in recent years, the existing works onanswering multi-hop complex question face the following challenges: i) suffering from poor performances due to the neglect of explicit relational chain order and its relational types reflected inuser questions; ii) failing to consider implicit relations between thetopic entity and the answer implied in structured KG because oflimited neighborhood size constraints in subgraph retrieval based algorithms. To address these issues in multi-hop KGQA, we proposea novel model in this paper, namely Relational Chain-based Embed-ded KGQA (Rce-KGQA), which simultaneously utilizes the explicitrelational chain described in natural language questions and the implicit relational chain stored in structured KG. Our extensiveempirical study on two open-domain benchmarks proves that ourmethod significantly outperforms the state-of-the-art counterpartslike GraftNet, PullNet and EmbedKGQA. Comprehensive ablation experiments also verify the effectiveness of our method for multi-hop KGQA tasks. We have made our model's source code availableat Github: https://github.com/albert-jin/Rce-KGQA.

* 10 pages, 5 figures; 36 references; This work was carried out during the first author's master time at Shanghai University. This work is also partially supported by an anonymous Natural Research Foundation. We would like to thank Hang Yu for providing helpful discussions and valuable recommendations 

  Access Paper or Ask Questions

Deep Learning Based Person Re-Identification Methods: A Survey and Outlook of Recent Works

Oct 10, 2021
Zhangqiang Ming, Min Zhu, Xiaoyong Wei, Xiangkun Wang, Jiamin Zhu, Junlong Cheng, Yong Yang

In recent years, with the increasing demand for public safety and the rapid development of intelligent surveillance networks, person re-identification (Re-ID) has become one of the hot research topics in the field of computer vision. Its main research goal is to retrieve persons with the same identity from different cameras. However, traditional person Re-ID methods require manual marking of person targets, which consumes a lot of labor costs. With the widespread application of deep neural networks in the field of computer vision, a large number of deep learning-based person Re-ID methods have emerged. To facilitate researchers to better understand the latest research results and future development trends in this field. Firstly, we compare traditional and deep learning-based person Re-ID methods, and present the main contributions of several person Re-ID surveys, and analyze their focused dimensions and shortcomings. Secondly, we focus on the current classic deep learning-based person Re-ID methods, including methods for deep metric learning, local feature learning, generate adversarial networks, sequence feature learning, and graph convolutional networks. Furthermore, we subdivide the above five categories according to their technique types, analyzing and comparing the experimental performance of part subcategories of the method. Finally, we discuss the challenges that remain in the field of person Re-ID field and prospects for future research directions.

* 21 pages, 13 figures 

  Access Paper or Ask Questions

Weight Evolution: Improving Deep Neural Networks Training through Evolving Inferior Weight Values

Oct 09, 2021
Zhenquan Lin, Kailing Guo, Xiaofen Xing, Xiangmin Xu

To obtain good performance, convolutional neural networks are usually over-parameterized. This phenomenon has stimulated two interesting topics: pruning the unimportant weights for compression and reactivating the unimportant weights to make full use of network capability. However, current weight reactivation methods usually reactivate the entire filters, which may not be precise enough. Looking back in history, the prosperity of filter pruning is mainly due to its friendliness to hardware implementation, but pruning at a finer structure level, i.e., weight elements, usually leads to better network performance. We study the problem of weight element reactivation in this paper. Motivated by evolution, we select the unimportant filters and update their unimportant elements by combining them with the important elements of important filters, just like gene crossover to produce better offspring, and the proposed method is called weight evolution (WE). WE is mainly composed of four strategies. We propose a global selection strategy and a local selection strategy and combine them to locate the unimportant filters. A forward matching strategy is proposed to find the matched important filters and a crossover strategy is proposed to utilize the important elements of the important filters for updating unimportant filters. WE is plug-in to existing network architectures. Comprehensive experiments show that WE outperforms the other reactivation methods and plug-in training methods with typical convolutional neural networks, especially lightweight networks. Our code is available at https://github.com/BZQLin/Weight-evolution.

* This paper is accepted by ACM Multimedia 2021 

  Access Paper or Ask Questions

Unsupervised Monocular Depth Perception: Focusing on Moving Objects

Aug 30, 2021
Hualie Jiang, Laiyan Ding, Zhenglong Sun, Rui Huang

As a flexible passive 3D sensing means, unsupervised learning of depth from monocular videos is becoming an important research topic. It utilizes the photometric errors between the target view and the synthesized views from its adjacent source views as the loss instead of the difference from the ground truth. Occlusion and scene dynamics in real-world scenes still adversely affect the learning, despite significant progress made recently. In this paper, we show that deliberately manipulating photometric errors can efficiently deal with these difficulties better. We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map. With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately. To the best of our knowledge, such cases have not been seriously considered in the previous works, even though they pose a high risk in applications like autonomous driving. We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps. Extensive experiments on the KITTI dataset and additional experiments on the Cityscapes dataset have verified the proposed approach's effectiveness on depth or ego-motion estimation. Furthermore, for the first time, we evaluate the predicted depth on the regions of dynamic objects and static background separately for both supervised and unsupervised methods. The evaluation further verifies the effectiveness of our proposed technical approach and provides some interesting observations that might inspire future research in this direction.

* Accepted by IEEE Sensors Journal. arXiv admin note: text overlap with arXiv:2003.01360 

  Access Paper or Ask Questions

Sentence Semantic Regression for Text Generation

Aug 06, 2021
Wei Wang, Piji Li, Hai-Tao Zheng

Recall the classical text generation works, the generation framework can be briefly divided into two phases: \textbf{idea reasoning} and \textbf{surface realization}. The target of idea reasoning is to figure out the main idea which will be presented in the following talking/writing periods. Surface realization aims to arrange the most appropriate sentence to depict and convey the information distilled from the main idea. However, the current popular token-by-token text generation methods ignore this crucial process and suffer from many serious issues, such as idea/topic drift. To tackle the problems and realize this two-phase paradigm, we propose a new framework named Sentence Semantic Regression (\textbf{SSR}) based on sentence-level language modeling. For idea reasoning, two architectures \textbf{SSR-AR} and \textbf{SSR-NonAR} are designed to conduct sentence semantic regression autoregressively (like GPT2/3) and bidirectionally (like BERT). In the phase of surface realization, a mixed-granularity sentence decoder is designed to generate text with better consistency by jointly incorporating the predicted sentence-level main idea as well as the preceding contextual token-level information. We conduct experiments on four tasks of story ending prediction, story ending generation, dialogue generation, and sentence infilling. The results show that SSR can obtain better performance in terms of automatic metrics and human evaluation.


  Access Paper or Ask Questions

M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis

Aug 01, 2021
Xingbo Wang, Jianben He, Zhihua Jin, Muqiao Yang, Yong Wang, Huamin Qu

Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2Lens, to visualize and explain multimodal models for sentiment analysis. M2Lens provides explanations on intra- and inter-modal interactions at the global, subset, and local levels. Specifically, it summarizes the influence of three typical interaction types (i.e., dominance, complement, and conflict) on the model predictions. Moreover, M2Lens identifies frequent and influential multimodal features and supports the multi-faceted exploration of model behaviors from language, acoustic, and visual modalities. Through two case studies and expert interviews, we demonstrate our system can help users gain deep insights into the multimodal models for sentiment analysis.

* 11 pages, 7 figures. This paper is accepted by IEEE VIS, 2021. To appear in IEEE Transactions on Visualization and Computer Graphics (TVCG) 

  Access Paper or Ask Questions

<<
561
562
563
564
565
566
567
568
569
570
571
572
573
>>