In this paper we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. Besides the sublinear scaling of the vocabulary size with database size (Heaps' law), here we report a new scaling of the fluctuations around this average (fluctuation scaling analysis). We explain both scaling laws by modeling the usage of words by simple stochastic processes in which the overall distribution of word-frequencies is fat tailed (Zipf's law) and the frequency of a single word is subject to fluctuations across documents (as in topic models). In this framework, the mean and the variance of the vocabulary size can be expressed as quenched averages, implying that: i) the inhomogeneous dissemination of words cause a reduction of the average vocabulary size in comparison to the homogeneous case, and ii) correlations in the co-occurrence of words lead to an increase in the variance and the vocabulary size becomes a non-self-averaging quantity. We address the implications of these observations to the measurement of lexical richness. We test our results in three large text databases (Google-ngram, Enlgish Wikipedia, and a collection of scientific articles).
Biclustering is a two way clustering approach involving simultaneous clustering along two dimensions of the data matrix. Finding biclusters of web objects (i.e. web users and web pages) is an emerging topic in the context of web usage mining. It overcomes the problem associated with traditional clustering methods by allowing automatic discovery of browsing pattern based on a subset of attributes. A coherent bicluster of clickstream data is a local browsing pattern such that users in bicluster exhibit correlated browsing pattern through a subset of pages of a web site. This paper proposed a new application of biclustering to web data using a combination of heuristics and meta-heuristics such as K-means, Greedy Search Procedure and Genetic Algorithms to identify the coherent browsing pattern. Experiment is conducted on the benchmark clickstream msnbc dataset from UCI repository. Results demonstrate the efficiency and beneficial outcome of the proposed method by correlating the users and pages of a web site in high degree.This approach shows excellent performance at finding high degree of overlapped coherent biclusters from web data.
Electro-optic (EO) modulation is a well-known and essential topic in the field of communications and sensing. Its ultrahigh efficiency is unprecedentedly desired in the current green and data era. However, dramatically increasing the modulation efficiency is difficult due to the monotonic mapping relationship between the electrical signal and modulated optical signal. Here, a new mechanism termed phase-transition EO modulation is revealed from the reciprocal transition between two distinct phase planes arising from the bifurcation. Remarkably, a monolithically integrated mode-locked laser (MLL) is implemented as a prototype. A 24.8-GHz radio-frequency signal is generated and modulated, achieving a modulation energy efficiency of 3.06 fJ/bit improved by about four orders of magnitude and a contrast ratio exceeding 50 dB. Thus, MLL-based phase-transition EO modulation is characterised by ultrahigh modulation efficiency and ultrahigh contrast ratio, as experimentally proved in radio-over-fibre and underwater acoustic-sensing systems. This phase-transition EO modulation opens a new avenue for green communication and ubiquitous connections.
Scene text recognition is a popular topic and can benefit various tasks. Although many methods have been proposed for the close-set text recognition challenges, they cannot be directly applied to open-set scenarios, where the evaluation set contains novel characters not appearing in the training set. Conventional methods require collecting new data and retraining the model to handle these novel characters, which is an expensive and tedious process. In this paper, we propose a label-to-prototype learning framework to handle novel characters without retraining the model. In the proposed framework, novel characters are effectively mapped to their corresponding prototypes with a label-to-prototype learning module. This module is trained on characters with seen labels and can be easily generalized to novel characters. Additionally, feature-level rectification is conducted via topology-preserving transformation, resulting in better alignments between visual features and constructed prototypes while having a reasonably small impact on model speed. A lot of experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets.
The research topic of sketch-to-portrait generation has witnessed a boost of progress with deep learning techniques. The recently proposed StyleGAN architectures achieve state-of-the-art generation ability but the original StyleGAN is not friendly for sketch-based creation due to its unconditional generation nature. To address this issue, we propose a direct conditioning strategy to better preserve the spatial information under the StyleGAN framework. Specifically, we introduce Spatially Conditioned StyleGAN (SC-StyleGAN for short), which explicitly injects spatial constraints to the original StyleGAN generation process. We explore two input modalities, sketches and semantic maps, which together allow users to express desired generation results more precisely and easily. Based on SC-StyleGAN, we present DrawingInStyles, a novel drawing interface for non-professional users to easily produce high-quality, photo-realistic face images with precise control, either from scratch or editing existing ones. Qualitative and quantitative evaluations show the superior generation ability of our method to existing and alternative solutions. The usability and expressiveness of our system are confirmed by a user study.
In recent years, computer-aided automatic polyp segmentation and neoplasm detection have been an emerging topic in medical image analysis, providing valuable support to colonoscopy procedures. Attentions have been paid to improving the accuracy of polyp detection and segmentation. However, not much focus has been given to latency and throughput for performing these tasks on dedicated devices, which can be crucial for practical applications. This paper introduces a novel deep neural network architecture called BlazeNeo, for the task of polyp segmentation and neoplasm detection with an emphasis on compactness and speed while maintaining high accuracy. The model leverages the highly efficient HarDNet backbone alongside lightweight Receptive Field Blocks for computational efficiency, and an auxiliary training mechanism to take full advantage of the training data for the segmentation quality. Our experiments on a challenging dataset show that BlazeNeo achieves improvements in latency and model size while maintaining comparable accuracy against state-of-the-art methods. When deploying on the Jetson AGX Xavier edge device in INT8 precision, our BlazeNeo achieves over 155 fps while yielding the best accuracy among all compared methods.
There have recently been significant advances in the accuracy of algorithms proposed for time series classification (TSC). However, a commonly asked question by real world practitioners and data scientists less familiar with the research topic, is whether the complexity of the algorithms considered state of the art is really necessary. Many times the first approach suggested is a simple pipeline of summary statistics or other time series feature extraction approaches such as TSFresh, which in itself is a sensible question; in publications on TSC algorithms generalised for multiple problem types, we rarely see these approaches considered or compared against. We experiment with basic feature extractors using vector based classifiers shown to be effective with continuous attributes in current state-of-the-art time series classifiers. We test these approaches on the UCR time series dataset archive, looking to see if TSC literature has overlooked the effectiveness of these approaches. We find that a pipeline of TSFresh followed by a rotation forest classifier, which we name FreshPRINCE, performs best. It is not state of the art, but it is significantly more accurate than nearest neighbour with dynamic time warping, and represents a reasonable benchmark for future comparison.
Automatic speech recognition (ASR) on low resource languages improves access of linguistic minorities to technological advantages provided by Artificial Intelligence (AI). In this paper, we address a problem of data scarcity of Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It combines philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics. We also review all existing Cantonese datasets and perform experiments on the two biggest datasets (MDCC and Common Voice zh-HK). We analyze the existing datasets according to their speech type, data source, total size and availability. The results of experiments conducted with Fairseq S2T Transformer, a state-of-the-art ASR model, show the effectiveness of our dataset. In addition, we create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.
Driven by the ever-increasing penetration and proliferation of data-driven applications, a new generation of wireless communication, the sixth-generation (6G) mobile system enhanced by artificial intelligence (AI), has attracted substantial research interests. Among various candidate technologies of 6G, low earth orbit (LEO) satellites have appealing characteristics of ubiquitous wireless access. However, the costs of satellite communication (SatCom) are still high, relative to counterparts of ground mobile networks. To support massively interconnected devices with intelligent adaptive learning and reduce expensive traffic in SatCom, we propose federated learning (FL) in LEO-based satellite communication networks. We first review the state-of-the-art LEO-based SatCom and related machine learning (ML) techniques, and then analyze four possible ways of combining ML with satellite networks. The learning performance of the proposed strategies is evaluated by simulation and results reveal that FL-based computing networks improve the performance of communication overheads and latency. Finally, we discuss future research topics along this research direction.
How to robustly rank the aesthetic quality of given images has been a long-standing ill-posed topic. Such challenge stems mainly from the diverse subjective opinions of different observers about the varied types of content. There is a growing interest in estimating the user agreement by considering the standard deviation of the scores, instead of only predicting the mean aesthetic opinion score. Nevertheless, when comparing a pair of contents, few studies consider how confident are we regarding the difference in the aesthetic scores. In this paper, we thus propose (1) a re-adapted multi-task attention network to predict both the mean opinion score and the standard deviation in an end-to-end manner; (2) a brand-new confidence interval ranking loss that encourages the model to focus on image-pairs that are less certain about the difference of their aesthetic scores. With such loss, the model is encouraged to learn the uncertainty of the content that is relevant to the diversity of observers' opinions, i.e., user disagreement. Extensive experiments have demonstrated that the proposed multi-task aesthetic model achieves state-of-the-art performance on two different types of aesthetic datasets, i.e., AVA and TMGA.