Transformers have become an indispensable module for text generation models since their great success in machine translation. Previous works attribute the~success of transformers to the query-key-value dot-product attention, which provides a robust inductive bias by the fully connected token graphs. However, we found that self-attention has a severe limitation. When predicting the (i+1)-th token, self-attention only takes the i-th token as an information collector, and it tends to give a high attention weight to those tokens similar to itself. Therefore, most of the historical information that occurred before the i-th token is not taken into consideration. Based on this observation, in this paper, we propose a new architecture, called bird-eye transformer(BET), which goes one step further to improve the performance of transformers by reweighting self-attention to encourage it to focus more on important historical information. We have conducted experiments on multiple text generation tasks, including machine translation (2 datasets) and language models (3 datasets). These experimental~results show that our proposed model achieves a better performance than the baseline transformer architectures on~all~datasets. The code is released at: \url{https://sites.google.com/view/bet-transformer/home}.
Link prediction is the task of inferring missing links between entities in knowledge graphs. Embedding-based methods have shown effectiveness in addressing this problem by modeling relational patterns in triples. However, the link prediction task often requires contextual information in entity neighborhoods, while most existing embedding-based methods fail to capture it. Additionally, little attention is paid to the diversity of entity representations in different contexts, which often leads to false prediction results. In this situation, we consider that the schema of knowledge graph contains the specific contextual information, and it is beneficial for preserving the consistency of entities across contexts. In this paper, we propose a novel schema-augmented multi-level contrastive learning framework (SMiLE) to conduct knowledge graph link prediction. Specifically, we first exploit network schema as the prior constraint to sample negatives and pre-train our model by employing a multi-level contrastive learning method to yield both prior schema and contextual information. Then we fine-tune our model under the supervision of individual triples to learn subtler representations for link prediction. Extensive experimental results on four knowledge graph datasets with thorough analysis of each component demonstrate the effectiveness of our proposed framework against state-of-the-art baselines.
Social media apps have become very promising and omnipresent in daily life. Most social media apps are used to deliver vital information to those nearby and far away. As our lives become more hectic, many of us strive to limit our usage of social media apps because they are too addictive, and the majority of us have gotten preoccupied with our daily lives. Because of this, we frequently overlook crucial information, such as invitations to weddings, interviews, birthday parties, etc., or find ourselves unable to attend the event. In most cases, this happens because users are more likely to discover the invitation or information only before the event, giving them little time to prepare. To solve this issue, in this study, we created a system that will collect social media chat and filter it using Natural Language Processing (NLP) methods like Tokenization, Stop Words Removal, Lemmatization, Segmentation, and Named Entity Recognition (NER). Also, Machine Learning Algorithms such as K-Nearest Neighbor (KNN) Algorithm are implemented to prioritize the received invitation and to sort the level of priority. Finally, a customized notification will be delivered to the users where they acknowledge the upcoming event. So, the chances of missing the event are less or can be planned.
A pair-trading strategy is an approach that utilizes the fluctuations between prices of a pair of stocks in a short-term time frame, while in the long-term the pair may exhibit a strong association and co-movement pattern. When the prices of the stocks exhibit significant divergence, the shares of the stock that gains in price are sold (a short strategy) while the shares of the other stock whose price falls are bought (a long strategy). This paper presents a cointegration-based approach that identifies stocks listed in the five sectors of the National Stock Exchange (NSE) of India for designing efficient pair-trading portfolios. Based on the stock prices from Jan 1, 2018, to Dec 31, 2020, the cointegrated stocks are identified and the pairs are formed. The pair-trading portfolios are evaluated on their annual returns for the year 2021. The results show that the pairs of stocks from the auto and the realty sectors, in general, yielded the highest returns among the five sectors studied in the work. However, two among the five pairs from the information technology (IT) sector are found to have yielded negative returns.
Differential privacy (DP) provides a formal privacy guarantee that prevents adversaries with access to machine learning models from extracting information about individual training points. Differentially private stochastic gradient descent (DPSGD) is the most popular training method with differential privacy in image recognition. However, existing DPSGD schemes lead to significant performance degradation, which prevents the application of differential privacy. In this paper, we propose a simulated annealing-based differentially private stochastic gradient descent scheme (SA-DPSGD) which accepts a candidate update with a probability that depends both on the update quality and on the number of iterations. Through this random update screening, we make the differentially private gradient descent proceed in the right direction in each iteration, and result in a more accurate model finally. In our experiments, under the same hyperparameters, our scheme achieves test accuracies 98.35%, 87.41% and 60.92% on datasets MNIST, FashionMNIST and CIFAR10, respectively, compared to the state-of-the-art result of 98.12%, 86.33% and 59.34%. Under the freely adjusted hyperparameters, our scheme achieves even higher accuracies, 98.89%, 88.50% and 64.17%. We believe that our method has a great contribution for closing the accuracy gap between private and non-private image classification.
Clustering is the task of gathering similar data samples into clusters without using any predefined labels. It has been widely studied in machine learning literature, and recent advancements in deep learning have revived interest in this field. Contrastive clustering (CC) models are a staple of deep clustering in which positive and negative pairs of each data instance are generated through data augmentation. CC models aim to learn a feature space where instance-level and cluster-level representations of positive pairs are grouped together. Despite improving the SOTA, these algorithms ignore the cross-instance patterns, which carry essential information for improving clustering performance. In this paper, we propose a novel contrastive clustering method, Cross-instance guided Contrastive Clustering (C3), that considers the cross-sample relationships to increase the number of positive pairs. In particular, we define a new loss function that identifies similar instances using the instance-level representation and encourages them to aggregate together. Extensive experimental evaluations show that our proposed method can outperform state-of-the-art algorithms on benchmark computer vision datasets: we improve the clustering accuracy by 6.8%, 2.8%, 4.9%, 1.3% and 0.4% on CIFAR-10, CIFAR-100, ImageNet-10, ImageNet-Dogs, and Tiny-ImageNet, respectively.
We propose fully distributed multi-group multicast precoding designs for cell-free massive multiple-input multiple-output (MIMO) systems with modest training overhead. We target the minimization of the sum of the maximum mean squared errors (MSEs) over the multicast groups, which is then approximated with a weighted sum MSE minimization to simplify the computation and signaling. To design the joint network-wide multi-group multicast precoders at the base stations (BSs) and the combiners at the user equipments (UEs) in a fully distributed fashion, we adopt an iterative bi-directional training scheme with UE-specific or group-specific precoded uplink pilots and group-specific precoded downlink pilots. To this end, we introduce a new group-specific uplink training resource that entirely eliminates the need for backhaul signaling for the channel state information (CSI) exchange. The precoders are optimized locally at each BS by means of either best-response or gradient-based updates, and the convergence of the two approaches is analyzed with respect to the centralized implementation with perfect CSI. Finally, numerical results show that the proposed distributed methods greatly outperform conventional cell-free massive MIMO precoding designs that rely solely on local CSI.
The problem of robustness in adverse weather conditions is considered a significant challenge for computer vision algorithms in the applicants of autonomous driving. Image rain removal algorithms are a general solution to this problem. They find a deep connection between raindrops/rain-streaks and images by mining the hidden features and restoring information about the rain-free environment based on the powerful representation capabilities of neural networks. However, previous research has focused on architecture innovations and has yet to consider the vulnerability issues that already exist in neural networks. This research gap hints at a potential security threat geared toward the intelligent perception of autonomous driving in the rain. In this paper, we propose a universal rain-removal attack (URA) on the vulnerability of image rain-removal algorithms by generating a non-additive spatial perturbation that significantly reduces the similarity and image quality of scene restoration. Notably, this perturbation is difficult to recognise by humans and is also the same for different target images. Thus, URA could be considered a critical tool for the vulnerability detection of image rain-removal algorithms. It also could be developed as a real-world artificial intelligence attack method. Experimental results show that URA can reduce the scene repair capability by 39.5% and the image generation quality by 26.4%, targeting the state-of-the-art (SOTA) single-image rain-removal algorithms currently available.
We introduce GENIUS: a conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective using an extreme and selective masking strategy, enabling it to generate diverse and high-quality texts given sketches. Comparison with other competitive conditional language models (CLMs) reveals the superiority of GENIUS's text generation quality. We further show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks. Most existing textual data augmentation methods are either too conservative, by making small changes to the original text, or too aggressive, by creating entirely new samples. With GENIUS, we propose GeniusAug, which first extracts the target-aware sketches from the original training set and then generates new samples based on the sketches. Empirical experiments on 6 text classification datasets show that GeniusAug significantly improves the models' performance in both in-distribution (ID) and out-of-distribution (OOD) settings. We also demonstrate the effectiveness of GeniusAug on named entity recognition (NER) and machine reading comprehension (MRC) tasks. (Code and models are publicly available at https://github.com/microsoft/SCGLab and https://github.com/beyondguo/genius)
Low-resolution analog-to-digital converters (ADCs) are promising for reducing energy consumption and costs of multiuser multiple-input multiple-output (MIMO) systems with many antennas. We propose low-resolution multiuser MIMO receivers where the signals are simultaneously processed by 1-bit ADCs and a comparator network, which can be interpreted as additional virtual channels with binary outputs. We distinguish the proposed comparator networks in fully and partially connected. For such receivers, we develop the low-resolution aware linear minimum mean-squared error (LRA-LMMSE) channel estimator and detector according to the Bussgang theorem. We also develop a robust detector which takes into account the channel state information (CSI) mismatch statistics. By exploiting knowledge of the channel coefficients we devise a mean-square error (MSE) greedy search and a sequential signal-to-interference-plus-noise ratio (SINR) search for optimization of partially connected networks. Numerical results show that a system with extra virtual channels can outperform a system with additional receive antennas, in terms of bit error rate (BER). Furthermore, by employing the proposed channel estimation with its error statistics, we construct a lower bound on the ergodic sum rate for a linear receiver. Simulation results confirm that the proposed approach outperforms the conventional 1-bit MIMO system in terms of BER, MSE and sum rate.