To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of tokens contained in subsequent packets even if they arrive on time. With a real-world measurement study, we show that current applications including ChatGPT, Claude, and Bard all suffer from increased stall under unstable network. For this emerging token streaming problem in LLM Chatbots, we propose a novel transport layer scheme, called Chatterbox, which puts new generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and can be independently rendered when received, thus avoiding aforementioned stalls caused by missing packets. Through simulation under various network conditions, we show Chatterbox reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the token streaming method commonly used by real chatbot applications and by 31.6% compared to a custom packet duplication scheme. By tailoring Chatterbox to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.
Recently, a data-driven Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm tailored to channels with intersymbol interference has been introduced. This so-called BCJRNet algorithm utilizes neural networks to calculate channel likelihoods. BCJRNet has demonstrated resilience against inaccurate channel tap estimations when applied to a time-invariant channel with ideal exponential decay profiles. However, its generalization capabilities for practically-relevant time-varying channels, where the receiver can only access incorrect channel parameters, remain largely unexplored. The primary contribution of this paper is to expand upon the results from existing literature to encompass a variety of imperfect channel knowledge cases that appear in real-world transmissions. Our findings demonstrate that BCJRNet significantly outperforms the conventional BCJR algorithm for stationary transmission scenarios when learning from noisy channel data and with imperfect channel decay profiles. However, this advantage is shown to diminish when the operating channel is also rapidly time-varying. Our results also show the importance of memory assumptions for conventional BCJR and BCJRNet. An underestimation of the memory largely degrades the performance of both BCJR and BCJRNet, especially in a slow-decaying channel. To mimic a situation closer to a practical scenario, we also combined channel tap uncertainty with imperfect channel memory knowledge. Somewhat surprisingly, our results revealed improved performance when employing the conventional BCJR with an underestimated memory assumption. BCJRNet, on the other hand, showed a consistent performance improvement as the level of accurate memory knowledge increased.
This letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual components has lacked robust solutions in prior research. The proposed method combines sound decomposition with previous techniques for audio spectral resynthesis. The time-stretched noise component is achieved by morphing its time-interpolated spectral magnitude with a white-noise excitation signal. This method stands out for its simplicity, efficiency, and audio quality. The results of a subjective experiment affirm the superiority of this approach over current state-of-the-art methods across all evaluated stretch factors. The proposed technique notably excels in extreme stretching scenarios, signifying a substantial elevation in performance. The proposed method holds promise for a wide range of applications in slow-motion media content, such as music or sports video production.
This paper studies a discrete-time mean-variance model based on reinforcement learning. Compared with its continuous-time counterpart in \cite{zhou2020mv}, the discrete-time model makes more general assumptions about the asset's return distribution. Using entropy to measure the cost of exploration, we derive the optimal investment strategy, whose density function is also Gaussian type. Additionally, we design the corresponding reinforcement learning algorithm. Both simulation experiments and empirical analysis indicate that our discrete-time model exhibits better applicability when analyzing real-world data than the continuous-time model.
Future autonomous vehicles (AVs) will use a variety of sensors that generate a vast amount of data. Naturally, this data not only serves self-driving algorithms; but can also assist other vehicles or the infrastructure in real-time decision-making. Consequently, vehicles shall exchange their measurement data over Vehicle-to-Everything (V2X) technologies. Moreover, predicting the state of the road network might be beneficial too. With such a prediction, we might mitigate road congestion, balance parking lot usage, or optimize the traffic flow. That would decrease transportation costs as well as reduce its environmental impact. In this paper, we propose a federated measurement and learning system that provides real-time data to fellow vehicles over Vehicle-to-Vehicle (V2V) communication while also operating a federated learning (FL) scheme over the Vehicle-to-Network (V2N) link to create a predictive model of the transportation network. As we are yet to have real-world AV data, we model it with a non-IID (independent and identically distributed) dataset to evaluate the capabilities of the proposed system in terms of performance and privacy. Results indicate that the proposed FL scheme improves learning performance and prevents eavesdropping at the aggregator server side.
In healthcare, patient data is often collected as multivariate time series, providing a comprehensive view of a patient's health status over time. While this data can be sparse, connected devices may enhance its frequency. The goal is to create patient profiles from these time series. In the absence of labels, a predictive model can be used to predict future values while forming a latent cluster space, evaluated based on predictive performance. We compare two models on Withing's datasets, M AGMAC LUST which clusters entire time series and DGM${}^2$ which allows the group affiliation of an individual to change over time (dynamic clustering).
Property prediction is a fundamental task in crystal material research. To model atoms and structures, structures represented as graphs are widely used and graph learning-based methods have achieved significant progress. Bond angles and bond distances are two key structural information that greatly influence crystal properties. However, most of the existing works only consider bond distances and overlook bond angles. The main challenge lies in the time cost of handling bond angles, which leads to a significant increase in inference time. To solve this issue, we first propose a crystal structure modeling based on dual scale neighbor partitioning mechanism, which uses a larger scale cutoff for edge neighbors and a smaller scale cutoff for angle neighbors. Then, we propose a novel Atom-Distance-Angle Graph Neural Network (ADA-GNN) for property prediction tasks, which can process node information and structural information separately. The accuracy of predictions and inference time are improved with the dual scale modeling and the specially designed architecture of ADA-GNN. The experimental results validate that our approach achieves state-of-the-art results in two large-scale material benchmark datasets on property prediction tasks.
Oracle Bone Script (OBS), one of the earliest known forms of ancient Chinese writing, holds invaluable insights into the humanities and geography of the Shang Dynasty, dating back 3,000 years. The immense historical and cultural significance of these writings cannot be overstated. However, the passage of time has obscured much of their meaning, presenting a significant challenge in deciphering these ancient texts. With the advent of Artificial Intelligence (AI), employing AI to assist in interpreting OBS has become a feasible option. Yet, progress in this area has been hindered by a lack of high-quality datasets. To address this issue, this paper details the creation of the HUST-OBS dataset. This dataset encompasses 77,064 images of 1,588 individual deciphered scripts and 62,989 images of 9,411 undeciphered characters, with a total of 140,053 images, compiled from diverse sources. Additionally, all images and labels have been reviewed and corrected by experts in oracle bone studies. The hope is that this dataset could inspire and assist future research in deciphering those unknown OBS.
There has been considerable recent interest in scoring properties on the basis of eviction risk. The success of methods for eviction prediction is typically evaluated using different measures of predictive accuracy. However, the underlying goal of such prediction is to direct appropriate assistance to households that may be at greater risk so they remain stably housed. Thus, we must ask the question of how useful such predictions are in targeting outreach efforts - informing action. In this paper, we investigate this question using a novel dataset that matches information on properties, evictions, and owners. We perform an eviction prediction task to produce risk scores and then use these risk scores to plan targeted outreach policies. We show that the risk scores are, in fact, useful, enabling a theoretical team of caseworkers to reach more eviction-prone properties in the same amount of time, compared to outreach policies that are either neighborhood-based or focus on buildings with a recent history of evictions. We also discuss the importance of neighborhood and ownership features in both risk prediction and targeted outreach.
Recent advances in deep learning has witnessed many innovative frameworks that solve high dimensional mean-field games (MFG) accurately and efficiently. These methods, however, are restricted to solving single-instance MFG and demands extensive computational time per instance, limiting practicality. To overcome this, we develop a novel framework to learn the MFG solution operator. Our model takes a MFG instances as input and output their solutions with one forward pass. To ensure the proposed parametrization is well-suited for operator learning, we introduce and prove the notion of sampling invariance for our model, establishing its convergence to a continuous operator in the sampling limit. Our method features two key advantages. First, it is discretization-free, making it particularly suitable for learning operators of high-dimensional MFGs. Secondly, it can be trained without the need for access to supervised labels, significantly reducing the computational overhead associated with creating training datasets in existing operator learning methods. We test our framework on synthetic and realistic datasets with varying complexity and dimensionality to substantiate its robustness.