Predicting conversion rate (e.g., the probability that a user will purchase an item) is a fundamental problem in machine learning based recommender systems. However, accurate conversion labels are revealed after a long delay, which harms the timeliness of recommender systems. Previous literature concentrates on utilizing early conversions to mitigate such a delayed feedback problem. In this paper, we show that post-click user behaviors are also informative to conversion rate prediction and can be used to improve timeliness. We propose a generalized delayed feedback model (GDFM) that unifies both post-click behaviors and early conversions as stochastic post-click information, which could be utilized to train GDFM in a streaming manner efficiently. Based on GDFM, we further establish a novel perspective that the performance gap introduced by delayed feedback can be attributed to a temporal gap and a sampling gap. Inspired by our analysis, we propose to measure the quality of post-click information with a combination of temporal distance and sample complexity. The training objective is re-weighted accordingly to highlight informative and timely signals. We validate our analysis on public datasets, and experimental performance confirms the effectiveness of our method.
Reinforcement Learning (RL) is currently one of the most commonly used techniques for traffic signal control (TSC), which can adaptively adjusted traffic signal phase and duration according to real-time traffic data. However, a fully centralized RL approach is beset with difficulties in a multi-network scenario because of exponential growth in state-action space with increasing intersections. Multi-agent reinforcement learning (MARL) can overcome the high-dimension problem by employing the global control of each local RL agent, but it also brings new challenges, such as the failure of convergence caused by the non-stationary Markov Decision Process (MDP). In this paper, we introduce an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the weakness of both fully centralized and MARL approaches. The OPNDQN algorithm solves the problem that traditional algorithms cannot be used in large state-action space traffic models by utilizing a fictitious game approach at each iteration to find the nash equilibrium among neighboring intersections, from which no intersection has incentive to unilaterally deviate. One of main advantages of OPNDQN is to mitigate the non-stationarity of multi-agent Markov process because it considers the mutual influence among neighboring intersections by sharing their actions. On the other hand, for training a large traffic network, the convergence rate of OPNDQN is higher than that of existing MARL approaches because it does not incorporate all state information of each agent. We conduct an extensive experiments by using Simulation of Urban MObility simulator (SUMO), and show the dominant superiority of OPNDQN over several existing MARL approaches in terms of average queue length, episode training reward and average waiting time.
Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging techniques from information retrieval. RISE is first trained as a retrieval task using a dual-encoder retrieval setup, and can then be subsequently utilized for evaluating a generated summary given an input document, without gold reference summaries. RISE is especially well suited when working on new datasets where one may not have reference summaries available for evaluation. We conduct comprehensive experiments on the SummEval benchmark (Fabbri et al., 2021) and the results show that RISE has higher correlation with human evaluations compared to many past approaches to summarization evaluation. Furthermore, RISE also demonstrates data-efficiency and generalizability across languages.
Coronary Computed Tomography Angiography (CCTA) provides information on the presence, extent, and severity of obstructive coronary artery disease. Large-scale clinical studies analyzing CCTA-derived metrics typically require ground-truth validation in the form of high-fidelity 3D intravascular imaging. However, manual rigid alignment of intravascular images to corresponding CCTA images is both time consuming and user-dependent. Moreover, intravascular modalities suffer from several non-rigid motion-induced distortions arising from distortions in the imaging catheter path. To address these issues, we here present a semi-automatic segmentation-based framework for both rigid and non-rigid matching of intravascular images to CCTA images. We formulate the problem in terms of finding the optimal \emph{virtual catheter path} that samples the CCTA data to recapitulate the coronary artery morphology found in the intravascular image. We validate our co-registration framework on a cohort of $n=40$ patients using bifurcation landmarks as ground truth for longitudinal and rotational registration. Our results indicate that our non-rigid registration significantly outperforms other co-registration approaches for luminal bifurcation alignment in both longitudinal (mean mismatch: 3.3 frames) and rotational directions (mean mismatch: 28.6 degrees). By providing a differentiable framework for automatic multi-modal intravascular data fusion, our developed co-registration modules significantly reduces the manual effort required to conduct large-scale multi-modal clinical studies while also providing a solid foundation for the development of machine learning-based co-registration approaches.
To mitigate the negative effects of false information more effectively, the development of automated AI (artificial intelligence) tools assisting fact-checkers is needed. Despite the existing research, there is still a gap between the fact-checking practitioners' needs and pains and the current AI research. We aspire to bridge this gap by employing methods of information behavior research to identify implications for designing better human-centered AI-based supporting tools. In this study, we conducted semi-structured in-depth interviews with Central European fact-checkers. The information behavior and requirements on desired supporting tools were analyzed using iterative bottom-up content analysis, bringing the techniques from grounded theory. The most significant needs were validated with a survey extended to fact-checkers from across Europe, in which we collected 24 responses from 20 European countries, i.e., 62% active European IFCN (International Fact-Checking Network) signatories. Our contributions are theoretical as well as practical. First, by being able to map our findings about the needs of fact-checking organizations to the relevant tasks for AI research, we have shown that the methods of information behavior research are relevant for studying the processes in the organizations and that these methods can be used to bridge the gap between the users and AI researchers. Second, we have identified fact-checkers' needs and pains focusing on so far unexplored dimensions and emphasizing the needs of fact-checkers from Central and Eastern Europe as well as from low-resource language groups which have implications for development of new resources (datasets) as well as for the focus of AI research in this domain.
Current data augmentation techniques and transformations are well suited for improving the size and quality of natural image datasets but are not yet optimized for medical imaging. We hypothesize that sub-optimal data augmentations can easily distort or occlude medical images, leading to false positives or negatives during patient diagnosis, prediction, or therapy/surgery evaluation. In our experimental results, we found that utilizing commonly used intensity-based data augmentation distorts the MRI scans and leads to texture information loss, thus negatively affecting the overall performance of classification. Additionally, we observed that commonly used data augmentation methods cannot be used with a plug-and-play approach in medical imaging, and requires manual tuning and adjustment.
The lip movements information is critical for many audio-visual tasks. However, extracting lip movements information from videos is challenging, as it can be easily perturbed by factors like personal identities and head poses. This paper proposes utilizing the parametric 3D face model to disentangle lip movements information explicitly. Building on top of the recent 3D face reconstruction advances, we firstly offer a method that can consistently disentangle expression information, where the lip movements information lies. Then we demonstrate that once the influences of perturbing factors are alleviated by synthesizing faces with the disentangled lip movements information, the lip-sync task can be done better with much fewer data. Finally, we show its effectiveness in the wild by testing it on an unseen dataset for the active speaker detection task and achieving competitive performance.
Barlow (1985) hypothesized that the co-occurrence of two events $A$ and $B$ is "suspicious" if $P(A,B) \gg P(A) P(B)$. We first review classical measures of association for $2 \times 2$ contingency tables, including Yule's $Y$ (Yule, 1912), which depends only on the odds ratio $\lambda$, and is independent of the marginal probabilities of the table. We then discuss the mutual information (MI) and pointwise mutual information (PMI), which depend on the ratio $P(A,B)/P(A)P(B)$, as measures of association. We show that, once the effect of the marginals is removed, MI and PMI behave similarly to $Y$ as functions of $\lambda$. The pointwise mutual information is used extensively in some research communities for flagging suspicious coincidences, but it is important to bear in mind the sensitivity of the PMI to the marginals, with increased scores for sparser events.
Reconfigurable intelligent surfaces (RISs) achieve high passive beamforming gains for signal enhancement or interference nulling by dynamically adjusting their reflection coefficients. Their employment is particularly appealing for improving both the wireless security and the efficiency of radio frequency (RF)-based wireless power transfer. Motivated by this, we conceive and investigate a RIS-assisted secure simultaneous wireless information and power transfer (SWIPT) system designed for information and power transfer from a base station (BS) to an information user (IU) and to multiple energy users (EUs), respectively. Moreover, the EUs are also potential eavesdroppers that may overhear the communication between the BS and IU. We adopt two-timescale transmission for reducing the signal processing complexity as well as channel training overhead, and aim for maximizing the average worst-case secrecy rate achieved by the IU. This is achieved by jointly optimizing the short-term transmit beamforming vectors at the BS as well as the long-term phase shifts at the RIS, under the energy harvesting constraints considered at the EUs and the power constraint at the BS. The stochastic optimization problem formulated is non-convex with intricately coupled variables, and is non-smooth due to the existence of multiple EUs/eavesdroppers. No standard optimization approach is available for this challenging scenario. To tackle this challenge, we propose a smooth approximation aided stochastic successive convex approximation (SA-SSCA) algorithm. Furthermore, a low-complexity heuristic algorithm is proposed for reducing the computational complexity without unduly eroding the performance. Simulation results show the efficiency of the RIS in securing SWIPT systems. The significant performance gains achieved by our proposed algorithms over the relevant benchmark schemes are also demonstrated.
Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis.