The influx of deep learning (DL) techniques into the field of survival analysis in recent years, coupled with the increasing availability of high-dimensional omics data and unstructured data like images or text, has led to substantial methodological progress; for instance, learning from such high-dimensional or unstructured data. Numerous modern DL-based survival methods have been developed since the mid-2010s; however, they often address only a small subset of scenarios in the time-to-event data setting - e.g., single-risk right-censored survival tasks - and neglect to incorporate more complex (and common) settings. Partially, this is due to a lack of exchange between experts in the respective fields. In this work, we provide a comprehensive systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes. In doing so, we hope to provide a helpful overview to practitioners who are interested in DL techniques applicable to their specific use case as well as to enable researchers from both fields to identify directions for future investigation. We provide a detailed characterization of the methods included in this review as an open-source, interactive table: https://survival-org.github.io/DL4Survival. As this research area is advancing rapidly, we encourage the research community to contribute to keeping the information up to date.
Incrementally recovering 3D dense structures from monocular videos is of paramount importance since it enables various robotics and AR applications. Feature volumes have recently been shown to enable efficient and accurate incremental dense reconstruction without the need to first estimate depth, but they are not able to achieve as high of a resolution as depth-based methods due to the large memory consumption of high-resolution feature volumes. This letter proposes a real-time feature volume-based dense reconstruction method that predicts TSDF (Truncated Signed Distance Function) values from a novel sparsified deep feature volume, which is able to achieve higher resolutions than previous feature volume-based methods, and is favorable in large-scale outdoor scenarios where the majority of voxels are empty. An uncertainty-aware multi-view stereo (MVS) network is leveraged to infer initial voxel locations of the physical surface in a sparse feature volume. Then for refining the recovered 3D geometry, deep features are attentively aggregated from multiview images at potential surface locations, and temporally fused. Besides achieving higher resolutions than before, our method is shown to produce more complete reconstructions with finer detail in many cases. Extensive evaluations on both public and self-collected datasets demonstrate a very competitive real-time reconstruction result for our method compared to state-of-the-art reconstruction methods in both indoor and outdoor settings.
In this paper, we study an optimal online resource reservation problem in a simple communication network. The network is composed of two compute nodes linked by a local communication link. The system operates in discrete time; at each time slot, the administrator reserves resources for servers before the actual job requests are known. A cost is incurred for the reservations made. Then, after the client requests are observed, jobs may be transferred from one server to the other to best accommodate the demands by incurring an additional transport cost. If certain job requests cannot be satisfied, there is a violation that engenders a cost to pay for each of the blocked jobs. The goal is to minimize the overall reservation cost over finite horizons while maintaining the cumulative violation and transport costs under a certain budget limit. To study this problem, we first formalize it as a repeated game against nature where the reservations are drawn randomly according to a sequence of probability distributions that are derived from an online optimization problem over the space of allowable reservations. We then propose an online saddle-point algorithm for which we present an upper bound for the associated K-benchmark regret together with an upper bound for the cumulative constraint violations. Finally, we present numerical experiments where we compare the performance of our algorithm with those of simple deterministic resource allocation policies.
Orthogonal time frequency space (OTFS) is a promising candidate waveform for the next generation wireless communication systems. OTFS places data in the delay-Doppler (DD) domain, which simplifies channel estimation in highmobility scenarios. However, due to the 2-D convolution effect of the time-varying channel in the DD domain, equalization is still a challenge for OTFS. Existing equalizers for OTFS are either highly complex or they do not consider intercarrier interference present in high-mobility scenarios. Hence, in this paper, we propose a novel two-stage detection technique for coded OTFS systems. Our proposed detector brings orders of magnitude computational complexity reduction compared to existing methods. At the first stage, it truncates the channel by considering only the significant coefficients along the Doppler dimension and performs turbo equalization. To reduce the computational load of the turbo equalizer, our proposed method deploys the modified LSQR (mLSQR) algorithm. At the second stage, with only two successive interference cancellation (SIC) iterations, our proposed detector removes the residual interference caused by channel truncation. To evaluate the performance of our proposed truncated turbo equalizer with SIC (TTE-SIC), we set the minimum mean squared error (MMSE) equalizer without channel truncation as a benchmark. Our simulation results show that the proposed TTE-SIC technique achieves about the same bit error rate (BER) performance as the benchmark.
The research in Deep Learning applications in sound and music computing have gathered an interest in the recent years; however, there is still a missing link between these new technologies and on how they can be incorporated into real-world artistic practices. In this work, we explore a well-known Deep Learning architecture called Variational Autoencoders (VAEs). These architectures have been used in many areas for generating latent spaces where data points are organized so that similar data points locate closer to each other. Previously, VAEs have been used for generating latent timbre spaces or latent spaces of symbolic music excepts. Applying VAE to audio features of timbre requires a vocoder to transform the timbre generated by the network to an audio signal, which is computationally expensive. In this work, we apply VAEs to raw audio data directly while bypassing audio feature extraction. This approach allows the practitioners to use any audio recording while giving flexibility and control over the aesthetics through dataset curation. The lower computation time in audio signal generation allows the raw audio approach to be incorporated into real-time applications. In this work, we propose three strategies to explore latent spaces of audio and timbre for sound design applications. By doing so, our aim is to initiate a conversation on artistic approaches and strategies to utilize latent audio spaces in sound and music practices.
Detecting anomalies in temporal data is challenging due to anomalies being dependent on temporal dynamics. One-class classification methods are commonly used for anomaly detection tasks, but they have limitations when applied to temporal data. In particular, mapping all normal instances into a single hypersphere to capture their global characteristics can lead to poor performance in detecting context-based anomalies where the abnormality is defined with respect to local information. To address this limitation, we propose a novel approach inspired by the loss function of DeepSVDD. Instead of mapping all normal instances into a single hypersphere center, each normal instance is pulled toward a recent context window. However, this approach is prone to a representation collapse issue where the neural network that encodes a given instance and its context is optimized towards a constant encoder solution. To overcome this problem, we combine our approach with a deterministic contrastive loss from Neutral AD, a promising self-supervised learning anomaly detection approach. We provide a theoretical analysis to demonstrate that the incorporation of the deterministic contrastive loss can effectively prevent the occurrence of a constant encoder solution. Experimental results show superior performance of our model over various baselines and model variants on real-world industrial datasets.
Tense inconsistency frequently occurs in machine translation. However, there are few criteria to assess the model's mastery of tense prediction from a linguistic perspective. In this paper, we present a parallel tense test set, containing French-English 552 utterances. We also introduce a corresponding benchmark, tense prediction accuracy. With the tense test set and the benchmark, researchers are able to measure the tense consistency performance of machine translation systems for the first time.
Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time. However, optimizing the model only on new data can lead to catastrophic forgetting of previously learned tasks, which undermines the model's ability to perform well over the long term. This paper introduces a new approach to continual audio representation learning called DeCoR. Unlike other methods that store previous data, features, or models, DeCoR indirectly distills knowledge from an earlier model to the latest by predicting quantization indices from a delayed codebook. We demonstrate that DeCoR improves acoustic scene classification accuracy and integrates well with continual self-supervised representation learning. Our approach introduces minimal storage and computation overhead, making it a lightweight and efficient solution for continual learning.
Reconfigurable intelligent surfaces (RISs) will play a key role to establish millimeter wave (mmWave) ultra-reliable low-latency communication systems for sixth-generation (6G) applications. Currently, there are a few working prototypes of RISs operating in the mmWave frequency band and all of them are based on passive reflective elements. However, to fabricate an efficiently working RIS at mmWave frequencies, it is crucial to take care of the strong signal attenuation, reflective element losses and undesired radio frequency (RF) circuit effects. In this paper, we provide measurement campaign results for an active RIS in the mmWave frequency band as well as its analysis and system design. The obtained results demonstrate that an active RIS outperforms a RIS working in passive mode and provides a higher signal-to-noise-ratio (SNR). The active RIS consists of active reflective elements that amplify the impinging signal and reflect the signal to the desired beam direction. To obtain an efficient RIS in terms of power consumption and RIS state switch time, we design a hexagonal RIS with 37 elements working at 26 GHz. These elements are designed to work whether in passive state (binary phase shifting) or in active state (switch OFF or amplifying). We provide a comparison between the performance of a RIS working in passive and active mode using numerical simulations and empirical measurements. This comparison reveals that the active reflective intelligent surface (RIS) provides a received power that is at least 4 dB higher than that of the equivalent passive RIS. These results demonstrate the strong advantage of using active RISs for future ultra-reliable low-latency wireless communications.
There is an important need for methods to reduce radiation dose and imaging time in myocardial perfusion imaging (MPI) SPECT. Deep learning (DL) methods have demonstrated promise in predicting normal-count images from low-count images for MPI SPECT, but the methods that have been objectively evaluated on the clinical task of detecting perfusion defects have not shown improved performance compared with low-count images. To address this need, we build upon concepts from model-observer theory and our understanding of the human visual system to propose a Detection task-specific DL-based approach for denoising MPI SPECT images (DEMIST). The approach, while performing denoising, is designed to preserve features that are known to impact observer performance on detection tasks. We objectively evaluated the proposed method on the task of detecting perfusion defects using a retrospective study with anonymized clinical data in patients who underwent MPI studies (N = 338). Performance on the task of detecting perfusion defects was quantified with an anthropomorphic channelized Hotelling observer. Images denoised with DEMIST yielded significantly improved detection performance compared to the corresponding low-dose images and images denoised with a commonly used task-agnostic DL-based denoising method. Similar results were observed with stratified analysis based on patient sex and defect type. Additionally, the proposed method significantly improved performance compared to the low-dose images in terms of the task-agnostic metrics of root mean squared error and structural similarity index metric. A mathematical analysis reveals that DEMIST preserves detection-task-specific features while improving the noise properties, thus resulting in improved observer performance. The results provide strong evidence for further clinical evaluation of DEMIST to denoise low-count images in MPI SPECT.