Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Big Networks: A Survey

Aug 09, 2020
Hayat Dino Bedru, Shuo Yu, Xinru Xiao, Da Zhang, Liangtian Wan, He Guo, Feng Xia

A network is a typical expressive form of representing complex systems in terms of vertices and links, in which the pattern of interactions amongst components of the network is intricate. The network can be static that does not change over time or dynamic that evolves through time. The complication of network analysis is different under the new circumstance of network size explosive increasing. In this paper, we introduce a new network science concept called big network. Big networks are generally in large-scale with a complicated and higher-order inner structure. This paper proposes a guideline framework that gives an insight into the major topics in the area of network science from the viewpoint of a big network. We first introduce the structural characteristics of big networks from three levels, which are micro-level, meso-level, and macro-level. We then discuss some state-of-the-art advanced topics of big network analysis. Big network models and related approaches, including ranking methods, partition approaches, as well as network embedding algorithms are systematically introduced. Some typical applications in big networks are then reviewed, such as community detection, link prediction, recommendation, etc. Moreover, we also pinpoint some critical open issues that need to be investigated further.

* Computer Science Review, Volume 37, August 2020, 100247 
* 69 pages, 4 figures 

  Access Paper or Ask Questions

Using Machine Learning to Predict the Evolution of Physics Research

Oct 29, 2018
Wenyuan Liu, Stanisław Saganowski, Przemysław Kazienko, Siew Ann Cheong

The advancement of science as outlined by Popper and Kuhn is largely qualitative, but with bibliometric data it is possible and desirable to develop a quantitative picture of scientific progress. Furthermore it is also important to allocate finite resources to research topics that have growth potential, to accelerate the process from scientific breakthroughs to technological innovations. In this paper, we address this problem of quantitative knowledge evolution by analysing the APS publication data set from 1981 to 2010. We build the bibliographic coupling and co-citation networks, use the Louvain method to detect topical clusters (TCs) in each year, measure the similarity of TCs in consecutive years, and visualize the results as alluvial diagrams. Having the predictive features describing a given TC and its known evolution in the next year, we can train a machine learning model to predict future changes of TCs, i.e., their continuing, dissolving, merging and splitting. We found the number of papers from certain journals, the degree, closeness, and betweenness to be the most predictive features. Additionally, betweenness increases significantly for merging events, and decreases significantly for splitting events. Our results represent a first step from a descriptive understanding of the Science of Science (SciSci), towards one that is ultimately prescriptive.

* 24 pages, 10 figures, 4 tables, supplementary information is included 

  Access Paper or Ask Questions

Pushing Your Point of View: Behavioral Measures of Manipulation in Wikipedia

Nov 09, 2011
Sanmay Das, Allen Lavoie, Malik Magdon-Ismail

As a major source for information on virtually any topic, Wikipedia serves an important role in public dissemination and consumption of knowledge. As a result, it presents tremendous potential for people to promulgate their own points of view; such efforts may be more subtle than typical vandalism. In this paper, we introduce new behavioral metrics to quantify the level of controversy associated with a particular user: a Controversy Score (C-Score) based on the amount of attention the user focuses on controversial pages, and a Clustered Controversy Score (CC-Score) that also takes into account topical clustering. We show that both these measures are useful for identifying people who try to "push" their points of view, by showing that they are good predictors of which editors get blocked. The metrics can be used to triage potential POV pushers. We apply this idea to a dataset of users who requested promotion to administrator status and easily identify some editors who significantly changed their behavior upon becoming administrators. At the same time, such behavior is not rampant. Those who are promoted to administrator status tend to have more stable behavior than comparable groups of prolific editors. This suggests that the Adminship process works well, and that the Wikipedia community is not overwhelmed by users who become administrators to promote their own points of view.

  Access Paper or Ask Questions

D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research

Apr 28, 2022
Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Bela Gipp

DBLP is the largest open-access repository of scientific articles on computer science and provides metadata associated with publications, authors, and venues. We retrieved more than 6 million publications from DBLP and extracted pertinent metadata (e.g., abstracts, author affiliations, citations) from the publication texts to create the DBLP Discovery Dataset (D3). D3 can be used to identify trends in research activity, productivity, focus, bias, accessibility, and impact of computer science research. We present an initial analysis focused on the volume of computer science research (e.g., number of papers, authors, research activity), trends in topics of interest, and citation patterns. Our findings show that computer science is a growing research field (approx. 15% annually), with an active and collaborative researcher community. While papers in recent years present more bibliographical entries in comparison to previous decades, the average number of citations has been declining. Investigating papers' abstracts reveals that recent topic trends are clearly reflected in D3. Finally, we list further applications of D3 and pose supplemental research questions. The D3 dataset, our findings, and source code are publicly available for research purposes.

  Access Paper or Ask Questions

Pre-training Methods in Information Retrieval

Nov 27, 2021
Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo, Yiqun Liu

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user's information need. Recently, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Since there have been a large number of works dedicating to the application of PTMs in IR, we believe it is the right time to summarize the current status, learn from existing methods, and gain some insights for future development. In this survey, we present an overview of PTMs applied in different components of IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and envision some promising directions, with the hope of inspiring more works on these topics for future research.

  Access Paper or Ask Questions

A Survey of Deep Learning for Data Caching in Edge Network

Aug 17, 2020
Yantong Wang, Vasilis Friderikos

The concept of edge caching provision in emerging 5G and beyond mobile networks is a promising method to deal both with the traffic congestion problem in the core network as well as reducing latency to access popular content. In that respect end user demand for popular content can be satisfied by proactively caching it at the network edge, i.e, at close proximity to the users. In addition to model based caching schemes learning-based edge caching optimizations has recently attracted significant attention and the aim hereafter is to capture these recent advances for both model based and data driven techniques in the area of proactive caching. This paper summarizes the utilization of deep learning for data caching in edge network. We first outline the typical research topics in content caching and formulate a taxonomy based on network hierarchical structure. Then, a number of key types of deep learning algorithms are presented, ranging from supervised learning to unsupervised learning as well as reinforcement learning. Furthermore, a comparison of state-of-the-art literature is provided from the aspects of caching topics and deep learning methods. Finally, we discuss research challenges and future directions of applying deep learning for caching

  Access Paper or Ask Questions

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Mar 12, 2020
Tong Yu, Hong Zhu

Since deep neural networks were developed, they have made huge contributions to everyday lives. Machine learning provides more rational advice than humans are capable of in almost every aspect of daily life. However, despite this achievement, the design and training of neural networks are still challenging and unpredictable procedures. To lower the technical thresholds for common users, automated hyper-parameter optimization (HPO) has become a popular topic in both academic and industrial areas. This paper provides a review of the most essential topics on HPO. The first section introduces the key hyper-parameters related to model training and structure, and discusses their importance and methods to define the value range. Then, the research focuses on major optimization algorithms and their applicability, covering their efficiency and accuracy especially for deep learning networks. This study next reviews major services and toolkits for HPO, comparing their support for state-of-the-art searching algorithms, feasibility with major deep learning frameworks, and extensibility for new modules designed by users. The paper concludes with problems that exist when HPO is applied to deep learning, a comparison between optimization algorithms, and prominent approaches for model evaluation with limited computational resources.

  Access Paper or Ask Questions

Studying Cultural Differences in Emoji Usage across the East and the West

Apr 04, 2019
Sharath Chandra Guntuku, Mingyang Li, Louis Tay, Lyle H. Ungar

Global acceptance of Emojis suggests a cross-cultural, normative use of Emojis. Meanwhile, nuances in Emoji use across cultures may also exist due to linguistic differences in expressing emotions and diversity in conceptualizing topics. Indeed, literature in cross-cultural psychology has found both normative and culture-specific ways in which emotions are expressed. In this paper, using social media, we compare the Emoji usage based on frequency, context, and topic associations across countries in the East (China and Japan) and the West (United States, United Kingdom, and Canada). Across the East and the West, our study examines a) similarities and differences on the usage of different categories of Emojis such as People, Food \& Drink, Travel \& Places etc., b) potential mapping of Emoji use differences with previously identified cultural differences in users' expression about diverse concepts such as death, money emotions and family, and c) relative correspondence of validated psycho-linguistic categories with Ekman's emotions. The analysis of Emoji use in the East and the West reveals recognizable normative and culture specific patterns. This research reveals the ways in which Emojis can be used for cross-cultural communication.

* ICWSM 2019 

  Access Paper or Ask Questions

Cross-Domain Labeled LDA for Cross-Domain Text Classification

Sep 16, 2018
Baoyu Jing, Chenwei Lu, Deqing Wang, Fuzhen Zhuang, Cheng Niu

Cross-domain text classification aims at building a classifier for a target domain which leverages data from both source and target domain. One promising idea is to minimize the feature distribution differences of the two domains. Most existing studies explicitly minimize such differences by an exact alignment mechanism (aligning features by one-to-one feature alignment, projection matrix etc.). Such exact alignment, however, will restrict models' learning ability and will further impair models' performance on classification tasks when the semantic distributions of different domains are very different. To address this problem, we propose a novel group alignment which aligns the semantics at group level. In addition, to help the model learn better semantic groups and semantics within these groups, we also propose a partial supervision for model's learning in source domain. To this end, we embed the group alignment and a partial supervision into a cross-domain topic model, and propose a Cross-Domain Labeled LDA (CDL-LDA). On the standard 20Newsgroup and Reuters dataset, extensive quantitative (classification, perplexity etc.) and qualitative (topic detection) experiments are conducted to show the effectiveness of the proposed group alignment and partial supervision.

* ICDM 2018 

  Access Paper or Ask Questions

Leveraging Medical Sentiment to Understand Patients Health on Social Media

Jul 30, 2018
Shweta Yadav, Joy Sain, Amit Sheth, Asif Ekbal, Sriparna Saha, Pushpak Bhattacharyya

The unprecedented growth of Internet users in recent years has resulted in an abundance of unstructured information in the form of social media text. A large percentage of this population is actively engaged in health social networks to share health-related information. In this paper, we address an important and timely topic by analyzing the users' sentiments and emotions w.r.t their medical conditions. Towards this, we examine users on popular medical forums (,, where they post on important topics such as asthma, allergy, depression, and anxiety. First, we provide a benchmark setup for the task by crawling the data, and further define the sentiment specific fine-grained medical conditions (Recovered, Exist, Deteriorate, and Other). We propose an effective architecture that uses a Convolutional Neural Network (CNN) as a data-driven feature extractor and a Support Vector Machine (SVM) as a classifier. We further develop a sentiment feature which is sensitive to the medical context. Here, we show that the use of medical sentiment feature along with extracted features from CNN improves the model performance. In addition to our dataset, we also evaluate our approach on the benchmark "CLEF eHealth 2014" corpora and show that our model outperforms the state-of-the-art techniques.

  Access Paper or Ask Questions