Spectral clustering is one of the most popular unsupervised machine learning methods. Constructing similarity matrix is crucial to this type of method. In most existing works, the similarity matrix is computed once for all or is updated alternatively. However, the former is difficult to reflect comprehensive relationships among data points, and the latter is time-consuming and is even infeasible for large-scale problems. In this work, we propose a restarted clustering framework with self-guiding and block diagonal representation. An advantage of the strategy is that some useful clustering information obtained from previous cycles could be preserved as much as possible. To the best of our knowledge, this is the first work that applies restarting strategy to spectral clustering. The key difference is that we reclassify the samples in each cycle of our method, while they are classified only once in existing methods. To further release the overhead, we introduce a block diagonal representation with Nystr\"{o}m approximation for constructing the similarity matrix. Theoretical results are established to show the rationality of inexact computations in spectral clustering. Comprehensive experiments are performed on some benchmark databases, which show the superiority of our proposed algorithms over many state-of-the-art algorithms for large-scale problems. Specifically, our framework has a potential boost for clustering algorithms and works well even using an initial guess chosen randomly.
As a competitive technology for 6G, semantic communications can significantly improve transmission efficiency. However, many existing semantic communication systems require information feedback during the training coding process, resulting in a significant communication overhead. In this article, we consider a two-way semantic communication (TW-SC) system, where information feedback can be omitted by exploiting the weight reciprocity in the transceiver. Particularly, the channel simulator and semantic transceiver are implemented on both TW-SC nodes and the channel distribution is modeled by a conditional generative adversarial network. Simulation results demonstrate that the proposed TW-SC system performs closing to the state-of-the-art one-way semantic communication systems but requiring no feedback between the transceiver in the training process.
Session-based recommendation systems(SBRS) are more suitable for the current e-commerce and streaming media recommendation scenarios and thus have become a hot topic. The data encountered by SBRS is typically highly sparse, which also serves as one of the bottlenecks limiting the accuracy of recommendations. So Contrastive Learning(CL) is applied in SBRS owing to its capability of improving embedding learning under the condition of sparse data. However, existing CL strategies are limited in their ability to enforce finer-grained (e.g., factor-level) comparisons and, as a result, are unable to capture subtle differences between instances. More than that, these strategies usually use item or segment dropout as a means of data augmentation which may result in sparser data and thus ineffective self-supervised signals. By addressing the two aforementioned limitations, we introduce a novel multi-granularity CL framework. Specifically, two extra augmented embedding convolution channels with different granularities are constructed and the embeddings learned by them are compared with those learned from original view to complete the CL tasks. At factor-level, we employ Disentangled Representation Learning to obtain finer-grained data(e.g. factor-level embeddings), with which we can construct factor-level convolution channels. At item-level, the star graph is deployed as the augmented data and graph convolution on it can ensure the effectiveness of self-supervised signals. Compare the learned embeddings of these two views with the learned embeddings of the basic view to achieve CL at two granularities. Finally, the more precise item-level and factor-level embeddings obtained are referenced to generate personalized recommendations for the user. The proposed model is validated through extensive experiments on two benchmark datasets, showcasing superior performance compared to existing methods.
In recent years, the use of large convolutional kernels has become popular in designing convolutional neural networks due to their ability to capture long-range dependencies and provide large receptive fields. However, the increase in kernel size also leads to a quadratic growth in the number of parameters, resulting in heavy computation and memory requirements. To address this challenge, we propose a neighborhood attention (NA) module that upgrades the standard convolution with a self-attention mechanism. The NA module efficiently extracts long-range dependencies in a sliding window pattern, thereby achieving similar performance to large convolutional kernels but with fewer parameters. Building upon the NA module, we propose a lightweight single image super-resolution (SISR) network named TCSR. Additionally, we introduce an enhanced feed-forward network (EFFN) in TCSR to improve the SISR performance. EFFN employs a parameter-free spatial-shift operation for efficient feature aggregation. Our extensive experiments and ablation studies demonstrate that TCSR outperforms existing lightweight SISR methods and achieves state-of-the-art performance. Our codes are available at \url{https://github.com/Aitical/TCSR}.
Intelligent wireless networks have long been expected to have self-configuration and self-optimization capabilities to adapt to various environments and demands. In this paper, we develop a novel distributed hierarchical deep reinforcement learning (DHDRL) framework with two-tier control networks in different timescales to optimize the long-term spectrum efficiency (SE) of the downlink cell-free multiple-input single-output (MISO) network, consisting of multiple distributed access points (AP) and user terminals (UT). To realize the proposed two-tier control strategy, we decompose the optimization problem into two sub-problems, AP-UT association (AUA) as well as beamforming and power allocation (BPA), resulting in a Markov decision process (MDP) and Partially Observable MDP (POMDP). The proposed method consists of two neural networks. At the system level, a distributed high-level neural network is introduced to optimize wireless network structure on a large timescale. While at the link level, a distributed low-level neural network is proposed to mitigate inter-AP interference and improve the transmission performance on a small timescale. Numerical results show that our method is effective for high-dimensional problems, in terms of spectrum efficiency, signaling overhead as well as satisfaction probability, and generalize well to diverse multi-object problems.
In this paper, we investigate the downlink power adaptation for the suborbital node in suborbital-ground communication systems under extremely high reliability and ultra-low latency communications requirements, which can be formulated as a power threshold-minimization problem. Specifically, the interference from satellites is modeled as an accumulation of stochastic point processes on different orbit planes, and hybrid beamforming (HBF) is considered. On the other hand, to ensure the Quality of Service (QoS) constraints, the finite blocklength regime is adopted. Numerical results show that the transmit power required by the suborbital node decreases as the elevation angle increases at the receiving station.
Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs are skill-demanding, time-consuming, and non-scalable to batch production. Although generative models emerge to make design automation no longer utopian, it remains non-trivial to customize designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground contents. In this study, we propose \textit{LayoutDETR} that inherits the high quality and realism from generative modeling, in the meanwhile reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal elements in a layout. Experiments validate that our solution yields new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ads banner dataset. For practical usage, we build our solution into a graphical system that facilitates user studies. We demonstrate that our designs attract more subjective preference than baselines by significant margins. Our code, models, dataset, graphical system, and demos are available at https://github.com/salesforce/LayoutDETR.
We present BotSIM, a data-efficient end-to-end Bot SIMulation toolkit for commercial text-based task-oriented dialog (TOD) systems. BotSIM consists of three major components: 1) a Generator that can infer semantic-level dialog acts and entities from bot definitions and generate user queries via model-based paraphrasing; 2) an agenda-based dialog user Simulator (ABUS) to simulate conversations with the dialog agents; 3) a Remediator to analyze the simulated conversations, visualize the bot health reports and provide actionable remediation suggestions for bot troubleshooting and improvement. We demonstrate BotSIM's effectiveness in end-to-end evaluation, remediation and multi-intent dialog generation via case studies on two commercial bot platforms. BotSIM's "generation-simulation-remediation" paradigm accelerates the end-to-end bot evaluation and iteration process by: 1) reducing manual test cases creation efforts; 2) enabling a holistic gauge of the bot in terms of NLU and end-to-end performance via extensive dialog simulation; 3) improving the bot troubleshooting process with actionable suggestions. A demo of our system can be found at https://tinyurl.com/mryu74cd and a demo video at https://youtu.be/qLi5iSoly30. We have open-sourced the toolkit at https://github.com/salesforce/botsim
Intent classification and slot filling are two core tasks in natural language understanding (NLU). The interaction nature of the two tasks makes the joint models often outperform the single designs. One of the promising solutions, called BERT (Bidirectional Encoder Representations from Transformers), achieves the joint optimization of the two tasks. BERT adopts the wordpiece to tokenize each input token into multiple sub-tokens, which causes a mismatch between the tokens and the labels lengths. Previous methods utilize the hidden states corresponding to the first sub-token as input to the classifier, which limits performance improvement since some hidden semantic informations is discarded in the fine-tune process. To address this issue, we propose a novel joint model based on BERT, which explicitly models the multiple sub-tokens features after wordpiece tokenization, thereby generating the context features that contribute to slot filling. Specifically, we encode the hidden states corresponding to multiple sub-tokens into a context vector via the attention mechanism. Then, we feed each context vector into the slot filling encoder, which preserves the integrity of the sentence. Experimental results demonstrate that our proposed model achieves significant improvement on intent classification accuracy, slot filling F1, and sentence-level semantic frame accuracy on two public benchmark datasets. The F1 score of the slot filling in particular has been improved from 96.1 to 98.2 (2.1% absolute) on the ATIS dataset.