Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Understanding the role of single-board computers in engineering and computer science education: A systematic literature review

Mar 30, 2022
Jonathan Álvarez Ariza, Heyson Baez

In the last decade, Single-Board Computers (SBCs) have been employed more frequently in engineering and computer science both to technical and educational levels. Several factors such as the versatility, the low-cost, and the possibility to enhance the learning process through technology have contributed to the educators and students usually employ these devices. However, the implications, possibilities, and constraints of these devices in engineering and Computer Science (CS) education have not been explored in detail. In this systematic literature review, we explore how the SBCs are employed in engineering and computer science and what educational results are derived from their usage in the period 2010-2020 at tertiary education. For that, 154 studies were selected out of n=605 collected from the academic databases Ei Compendex, ERIC, and Inspec. The analysis was carried-out in two phases, identifying, e.g., areas of application, learning outcomes, and students and researchers' perceptions. The results mainly indicate the following aspects: (1) The areas of laboratories and e-learning, computing education, robotics, Internet of Things (IoT), and persons with disabilities gather the studies in the review. (2) Researchers highlight the importance of the SBCs to transform the curricula in engineering and CS for the students to learn complex topics through experimentation in hands-on activities. (3) The typical cognitive learning outcomes reported by the authors are the improvement of the students' grades and the technical skills regarding the topics in the courses. Concerning the affective learning outcomes, the increase of interest, motivation, and engagement are commonly reported by the authors.

* Computer applications in engineering education (2022); vol 30; pp 304-329 
* 27 pages 

  Access Paper or Ask Questions

A Large-Scale Rich Context Query and Recommendation Dataset in Online Knowledge-Sharing

Jun 11, 2021
Bin Hao, Min Zhang, Weizhi Ma, Shaoyun Shi, Xinxing Yu, Houzhi Shan, Yiqun Liu, Shaoping Ma

Data plays a vital role in machine learning studies. In the research of recommendation, both user behaviors and side information are helpful to model users. So, large-scale real scenario datasets with abundant user behaviors will contribute a lot. However, it is not easy to get such datasets as most of them are only hold and protected by companies. In this paper, a new large-scale dataset collected from a knowledge-sharing platform is presented, which is composed of around 100M interactions collected within 10 days, 798K users, 165K questions, 554K answers, 240K authors, 70K topics, and more than 501K user query keywords. There are also descriptions of users, answers, questions, authors, and topics, which are anonymous. Note that each user's latest query keywords have not been included in previous open datasets, which reveal users' explicit information needs. We characterize the dataset and demonstrate its potential applications for recommendation study. Multiple experiments show the dataset can be used to evaluate algorithms in general top-N recommendation, sequential recommendation, and context-aware recommendation. This dataset can also be used to integrate search and recommendation and recommendation with negative feedback. Besides, tasks beyond recommendation, such as user gender prediction, most valuable answerer identification, and high-quality answer recognition, can also use this dataset. To the best of our knowledge, this is the largest real-world interaction dataset for personalized recommendation.

* 7 pages 

  Access Paper or Ask Questions

Query-oriented text summarization based on hypergraph transversals

Feb 02, 2019
Hadrien Van Lierde, Tommy W. S. Chow

Existing graph- and hypergraph-based algorithms for document summarization represent the sentences of a corpus as the nodes of a graph or a hypergraph in which the edges represent relationships of lexical similarities between sentences. Each sentence of the corpus is then scored individually, using popular node ranking algorithms, and a summary is produced by extracting highly scored sentences. This approach fails to select a subset of jointly relevant sentences and it may produce redundant summaries that are missing important topics of the corpus. To alleviate this issue, a new hypergraph-based summarizer is proposed in this paper, in which each node is a sentence and each hyperedge is a theme, namely a group of sentences sharing a topic. Themes are weighted in terms of their prominence in the corpus and their relevance to a user-defined query. It is further shown that the problem of identifying a subset of sentences covering the relevant themes of the corpus is equivalent to that of finding a hypergraph transversal in our theme-based hypergraph. Two extensions of the notion of hypergraph transversal are proposed for the purpose of summarization, and polynomial time algorithms building on the theory of submodular functions are proposed for solving the associated discrete optimization problems. The worst-case time complexity of the proposed algorithms is squared in the number of terms, which makes it cheaper than the existing hypergraph-based methods. A thorough comparative analysis with related models on DUC benchmark datasets demonstrates the effectiveness of our approach, which outperforms existing graph- or hypergraph-based methods by at least 6% of ROUGE-SU4 score.

* This is the unrefereed Author's Original Version (or pre-print Version) of the article 

  Access Paper or Ask Questions

CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

Jul 07, 2018
Kevin Tian, Teng Zhang, James Zou

Word embedding is a useful approach to capture co-occurrence structures in large text corpora. However, in addition to the text data itself, we often have additional covariates associated with individual corpus documents---e.g. the demographic of the author, time and venue of publication---and we would like the embedding to naturally capture this information. We propose CoVeR, a new tensor decomposition model for vector embeddings with covariates. CoVeR jointly learns a \emph{base} embedding for all the words as well as a weighted diagonal matrix to model how each covariate affects the base embedding. To obtain author or venue-specific embedding, for example, we can then simply multiply the base embedding by the associated transformation matrix. The main advantages of our approach are data efficiency and interpretability of the covariate transformation. Our experiments demonstrate that our joint model learns substantially better covariate-specific embeddings compared to the standard approach of learning a separate embedding for each covariate using only the relevant subset of data, as well as other related methods. Furthermore, CoVeR encourages the embeddings to be "topic-aligned" in that the dimensions have specific independent meanings. This allows our covariate-specific embeddings to be compared by topic, enabling downstream differential analysis. We empirically evaluate the benefits of our algorithm on datasets, and demonstrate how it can be used to address many natural questions about covariate effects. Accompanying code to this paper can be found at

* 12 pages. Appears in ICML 2018 

  Access Paper or Ask Questions

New Hybrid Neuro-Evolutionary Algorithms for Renewable Energy and Facilities Management Problems

Jun 05, 2018
L. Cornejo-Bueno

This Ph.D. thesis deals with the optimization of several renewable energy resources development as well as the improvement of facilities management in oceanic engineering and airports, using computational hybrid methods belonging to AI to this end. Energy is essential to our society in order to ensure a good quality of life. This means that predictions over the characteristics on which renewable energies depend are necessary, in order to know the amount of energy that will be obtained at any time. The second topic tackled in this thesis is related to the basic parameters that influence in different marine activities and airports, whose knowledge is necessary to develop a proper facilities management in these environments. Within this work, a study of the state-of-the-art Machine Learning have been performed to solve the problems associated with the topics above-mentioned, and several contributions have been proposed: One of the pillars of this work is focused on the estimation of the most important parameters in the exploitation of renewable resources. The second contribution of this thesis is related to feature selection problems. The proposed methodologies are applied to multiple problems: the prediction of $H_s$, relevant for marine energy applications and marine activities, the estimation of WPREs, undesirable variations in the electric power produced by a wind farm, the prediction of global solar radiation in areas from Spain and Australia, really important in terms of solar energy, and the prediction of low-visibility events at airports. All of these practical issues are developed with the consequent previous data analysis, normally, in terms of meteorological variables.

* arXiv admin note: text overlap with arXiv:1706.03673, arXiv:1805.03463 by other authors 

  Access Paper or Ask Questions

Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter

Jan 28, 2021
Elena Zotova, Rodrigo Agerri, German Rigau

Popular social media networks provide the perfect environment to study the opinions and attitudes expressed by users. While interactions in social media such as Twitter occur in many natural languages, research on stance detection (the position or attitude expressed with respect to a specific topic) within the Natural Language Processing field has largely been done for English. Although some efforts have recently been made to develop annotated data in other languages, there is a telling lack of resources to facilitate multilingual and crosslingual research on stance detection. This is partially due to the fact that manually annotating a corpus of social media texts is a difficult, slow and costly process. Furthermore, as stance is a highly domain- and topic-specific phenomenon, the need for annotated data is specially demanding. As a result, most of the manually labeled resources are hindered by their relatively small size and skewed class distribution. This paper presents a method to obtain multilingual datasets for stance detection in Twitter. Instead of manually annotating on a per tweet basis, we leverage user-based information to semi-automatically label large amounts of tweets. Empirical monolingual and cross-lingual experimentation and qualitative analysis show that our method helps to overcome the aforementioned difficulties to build large, balanced and multilingual labeled corpora. We believe that our method can be easily adapted to easily generate labeled social media data for other Natural Language Processing tasks and domains.

* Expert Systems with Applications, 170 (2021), Elsevier 
* Stance detection, multilingualism, text categorization, fake news, deep learning 

  Access Paper or Ask Questions

ADMM-based Networked Stochastic Variational Inference

Feb 27, 2018
Hamza Anwar, Quanyan Zhu

Owing to the recent advances in "Big Data" modeling and prediction tasks, variational Bayesian estimation has gained popularity due to their ability to provide exact solutions to approximate posteriors. One key technique for approximate inference is stochastic variational inference (SVI). SVI poses variational inference as a stochastic optimization problem and solves it iteratively using noisy gradient estimates. It aims to handle massive data for predictive and classification tasks by applying complex Bayesian models that have observed as well as latent variables. This paper aims to decentralize it allowing parallel computation, secure learning and robustness benefits. We use Alternating Direction Method of Multipliers in a top-down setting to develop a distributed SVI algorithm such that independent learners running inference algorithms only require sharing the estimated model parameters instead of their private datasets. Our work extends the distributed SVI-ADMM algorithm that we first propose, to an ADMM-based networked SVI algorithm in which not only are the learners working distributively but they share information according to rules of a graph by which they form a network. This kind of work lies under the umbrella of `deep learning over networks' and we verify our algorithm for a topic-modeling problem for corpus of Wikipedia articles. We illustrate the results on latent Dirichlet allocation (LDA) topic model in large document classification, compare performance with the centralized algorithm, and use numerical experiments to corroborate the analytical results.

* to be submitted for publishing 

  Access Paper or Ask Questions

Time Dependency, Data Flow, and Competitive Advantage

Mar 17, 2022
Ehsan Valavi, Joel Hestness, Marco Iansiti, Newsha Ardalani, Feng Zhu, Karim R. Lakhani

Data is fundamental to machine learning-based products and services and is considered strategic due to its externalities for businesses, governments, non-profits, and more generally for society. It is renowned that the value of organizations (businesses, government agencies and programs, and even industries) scales with the volume of available data. What is often less appreciated is that the data value in making useful organizational predictions will range widely and is prominently a function of data characteristics and underlying algorithms. In this research, our goal is to study how the value of data changes over time and how this change varies across contexts and business areas (e.g. next word prediction in the context of history, sports, politics). We focus on data from and compare the value's time-dependency across various Reddit topics (Subreddits). We make this comparison by measuring the rate at which user-generated text data loses its relevance to the algorithmic prediction of conversations. We show that different subreddits have different rates of relevance decline over time. Relating the text topics to various business areas of interest, we argue that competing in a business area in which data value decays rapidly alters strategies to acquire competitive advantage. When data value decays rapidly, access to a continuous flow of data will be more valuable than access to a fixed stock of data. In this kind of setting, improving user engagement and increasing user-base help creating and maintaining a competitive advantage.

* 24 Pages 

  Access Paper or Ask Questions

On the Design of Complex EM Devices and Systems through the System-by-Design Paradigm -- A Framework for Dealing with the Computational Complexity

Jul 16, 2021
Andrea Massa, Marco Salucci

The System-by-Design (SbD) is an emerging engineering framework for the optimization-driven design of complex electromagnetic (EM) devices and systems. More specifically, the computational complexity of the design problem at hand is addressed by means of a suitable selection and integration of functional blocks comprising problem-dependent and computationally-efficient modeling and analysis tools as well as reliable prediction and optimization strategies. Thanks to the suitable re-formulation of the problem at hand as an optimization one, the profitable minimum-size coding of the degrees-of-freedom (DoFs), the "smart" replacement of expensive full-wave (FW) simulators with proper surrogate models (SMs), which yield fast yet accurate predictions starting from minimum size/reduced CPU-costs training sets, a favorable "environment" for an optimal exploitation of the features of global optimization tools in sampling wide/complex/nonlinear solution spaces is built. This research summary is then aimed at (i) providing a comprehensive description of the SbD framework and of its pillar concepts and strategies, (ii) giving useful guidelines for its successful customization and application to different EM design problems characterized by different levels of computational complexity, (iii) envisaging future trends and advances in this fascinating and high-interest (because of its relevant and topical industrial and commercial implications) topic. Representative benchmarks concerned with the synthesis of complex EM systems are presented to highlight advantages and potentialities as well as current limitations of the SbD paradigm.

  Access Paper or Ask Questions

Learning from Very Few Samples: A Survey

Sep 12, 2020
Jiang Lu, Pinghua Gong, Jieping Ye, Changshui Zhang

Few sample learning (FSL) is significant and challenging in the field of machine learning. The capability of learning and generalizing from very few samples successfully is a noticeable demarcation separating artificial intelligence and human intelligence since humans can readily establish their cognition to novelty from just a single or a handful of examples whereas machine learning algorithms typically entail hundreds or thousands of supervised samples to guarantee generalization ability. Despite the long history dated back to the early 2000s and the widespread attention in recent years with booming deep learning technologies, little surveys or reviews for FSL are available until now. In this context, we extensively review 300+ papers of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive survey for FSL. In this survey, we review the evolution history as well as the current progress on FSL, categorize FSL approaches into the generative model based and discriminative model based kinds in principle, and emphasize particularly on the meta learning based FSL approaches. We also summarize several recently emerging extensional topics of FSL and review the latest advances on these topics. Furthermore, we highlight the important FSL applications covering many research hotspots in computer vision, natural language processing, audio and speech, reinforcement learning and robotic, data analysis, etc. Finally, we conclude the survey with a discussion on promising trends in the hope of providing guidance and insights to follow-up researches.

* 30 pages 

  Access Paper or Ask Questions